How to Use Robots noindex?
Google recently announced that they will no longer support the “noindex” directive on the robots.txt file as of September 1, 2019. What does this mean? Does the robots noindex matter to you and what changes should you make to your robots.txt file? And what pages should you noindex anyway (if anything at all)?
There aren’t easy answers to these questions—noindex is a deceptively simple concept and, if used incorrectly, can wreak havoc on your site’s SEO performance.
Skip Ahead to:
- What does all of this mean?
- Does this change matter to me?
- Should I noindex anyways?
- Alternative Options for noindex
What does robots noindex no longer being supported mean for my site?
This removal of support for the noindex in the robots.txt is part of Google’s efforts to formalize the Robots Exclusion Protocol specifications. Those specifications have existed informally for the last two decades and have helped webmasters tell robots how to crawl through a website. Because the specifications have been informal, it has led to a wide array of different types of directives that were placed in the robots.txt file. One of those informal directives was “Noindex”. That’s right—the robots Noindex was never officially supported anyway but, despite the lack of officiality, this directive just happened to work more often than not.
Prior to this change, if you placed the noindex directive in your website’s robots.txt file, you were telling Google to not include the pages specified in Google’s search index. If the page isn’t included in the index, that page won’t appear in search results for people to find. The great thing about the robots noindex directive was that it could noindex a large volume of pages at once. For example, if you wanted to noindex blog tag pages contained in the /blog-tag/ directory, you could specify “Noindex: /blog-tag” in the robots.txt and all blog-tag pages would be removed from Google’s index.
As a result, it is disappointing this directive won’t be supported moving forward. It is understandable that Google wants to remove it though. Because of its ability to noindex so many pages, it was very possible that people could accidentally noindex more pages than intended. This is a common mistake we’ve seen at Elementive with our clients.
Alternatives to Robots.txt Noindex
Disappointing or not, the robots noindex directive is no longer a tool we have available to help shape how our website appears in search results. Without the noindex directive available in the robots.txt file, there are other alternatives we have available to tell Google to not index a page.
Option #1: Meta Noindex
The most common alternative is the meta noindex tag, which is a tag placed in the HTML <head> of a page and it looks like this:
<meta name="robots" content="noindex " />
If this tag were placed on a page, that tag would tell Googlebot (and other search robots) to not include this page in the index and, by extension, not include the page in search results. This directive must be placed on each individual page; no bulk or multi-page options are available. However, within a content management system, like WordPress, you can specify this tag on a template to affect more pages at once—for example, all blog tag pages could use the same template with this noindex meta tag included.
Of course, in some content management systems, specifying noindex tags can be difficult. For example, one of our clients has a proprietary content manager that shares the <head> across all pages of the site. That means if a meta noindex tag were added, it would be added to every page on the website—clearly not something this company would want to do. In this system, there are exceptions made to allow page-specific title and description tags, but everything else in the <head> is the same sitewide. Adding a noindex would require carving out page-level exceptions similar to what they’ve done for title and description tags. This is a larger undertaking for the developers and an investment the company hasn’t previously deemed necessary. Besides, even if the investment was made to allow for the page-level noindex, if the coding ever broke and the exceptions didn’t work correctly, this company could accidentally add noindex to every page on their website.
Hopefully your website’s content manager isn’t as convoluted as that example, but this example underscores why you need to understand the website’s underlying technology before recommending a change like adding a meta noindex tag. Once you understand the underlying technology, you can determine if you actually need to add the noindex page-by-page or if opportunities exist to apply the noindex in bulk.
Option #2: X-Robots
Another option for specifying noindex is with the X-Robots HTTP header. In concept, X-Robots behaves almost exactly the same to meta noindex. However, unlike the meta noindex, X-Robots isn’t specified in the HTML, but in the HTTP header response. As a result, it isn’t viewable in the code but only viewable when you inspect the HTTP headers. If specified, the HTTP would include something like this:
Typically, you would use this tag for non-HTML content, such as images or PDFs, but there is no reason you couldn’t use the X-Robots on an HTML page. Depending on the setup of the website’s technical structure and/or the website’s content management systems, specifying the noindex directive this way can actually prove to be simpler than adding a meta tag to the page’s code.
Does the robots.txt noindex change matter to you?
This specific change to the robots noindex only matters directly to you if you are currently relying on this tool as a means of keeping page’s out of Google’s index. The easiest way to check is by loading your robots.txt file. To do so, add “robots.txt” to the end of your URL. For example, https://www.matthewedgar.net/robots.txt. Once you are viewing it, see if spot any references to “Noindex” within the file. If you do, you need to change how you specify noindex to either the meta robots tag or X-Robots. (Alternatively, can remove the page or disallow the page—see the next section below for details or contact me to discuss what options are appropriate for your website.)
More generally, though, this change (and Google’s Robots Exclusion Protocol) matter to anybody who wants to be able to control what people find in search results from your website. One of the most fundamental aspects of search engine optimization is guiding a robot through your website. You want to make sure the robot is finding all the pages they are supposed to find, not finding any pages they ought not find, and that the robot understands what to include in search results.
That leads to the question—what pages should a robot find or not find and what pages should a robot include or not include in search results?
Should I Noindex?
Generally, it should be left up to robots to decide what should or should not be indexed. However, there are times you’d want to decide on behalf of the search robot and take more control over what pages appear in search results. There are two main questions to consider regarding noindex commands. If you currently have pages noindexed, it is good to regularly use these questions to reconsider the reasons for those pages being noindexed.
Question #1: Search Result Page Clicks
Noindex is a tool that prevents pages from appearing in search results. Accordingly, the question to consider before using this tool is what pages aren’t a good representation of your website in search results? That is, after people conduct a search and begin reviewing search results, searchers are looking to click on the best pages on the best websites—best, in this case, meaning the pages and websites that seem to most closely match that searcher’s interests and intentions. Go through your website and compare the pages you have to the search terms those pages rank for—what pages aren’t a good representation of your website for searchers and what pages are unlikely to entice a click given the various search results they are (or could) rank for?
Question #2: Quality Entry Point
It isn’t just about how the pages appear on search results and if the page can entice a searcher to click to your website. It is also about ensuring the page is a good entry point from a search result. After somebody clicks a search result to come to your website, you want that searcher to be fully satisfied by the page. Forget any possible benefits people staying on your website could have for SEO—instead, think about all the benefits people staying on your website can have for your business. What pages are the best entry point from search results and have the best opportunity to satisfy visitors?
Noindex: A Tool of Last Resort
The biggest mistake here is to be too aggressive and noindex all or almost all the pages on your website for fear the pages won’t entice clicks or fear the pages won’t satisfy visitors. The better course of action is to start by allowing almost everything into Google’s index, with a few minor exceptions (see below). As you watch how your website performs in organic search, pay attention to what pages don’t seem to be working for searchers. What pages have low click through rates from search result pages? What pages have high bounce rates, indicating a lack of visitor interest or mismatched visitor intentions?
As you identify these pages, ask yourself if there are other things you can do to improve click through rates or reduce the bounce rate. For example, can you revise the title or description of the page to enhance the search result listings and encourage more clicks? Or, can you alter the content, design, layout, or structure of the page itself to encourage more people to stay on the page?
If none of those changes are working, then perhaps noindex is the correct answer. But, noindex should almost always be the last answer to consider. Once the page is removed from the search results, you lose any opportunity for traffic. So, be careful with this tool.
Pages To Almost Always Noindex
There are a few standard types of pages that should almost always be noindexed before evaluating performance in search results, assuming this type of content does appear on your website.
- Category or tag pages on a blog—it might be better for people to find a relevant blog post or the main blog page from the search result.
- Landing pages—a landing page for an email or advertisement campaign shouldn’t be indexed in a search result since you don’t want organic search traffic coming to this page too.
- Duplicate content—if two pages share the same content, one of the pages can be set to noindex to prevent the pages from competing against each other to appear in the same search results.
Other Options Than Noindex
Along with noindex, there are other ways to handle content you don’t want to appear in search results. Let’s discuss the most common two. If you are evaluating what to noindex or find yourself revisiting noindex because of Google’s change to the noindex in the robots.txt file, these are viable options to consider.
Noindex vs. Disallow
You also have a Disallow directive that be specified on the robots.txt file. Some have suggested that you could swap a robots Noindex to a Disallow directive instead. However, that likely isn’t the right answer.
To understand why, we need to look at the difference between noindex and disallow. Where noindex keeps a page out of the search results, the disallow command tells robots they should not crawl a particular page. However, because a disallow doesn’t govern indexing, a disallowed page can still appear in search results if the robot is able to find out about the page in some other way (such as from links on the web). If the page does appear in search results, the result won’t include any information from the website because the robots were disallowed from crawling that page.
This is why you likely won’t want to change a page that is currently noindexed to be disallowed. After all, if you have decided that a page is unlikely to satisfy searchers and encourage clicks, an information-less search result is not likely to better satisfy searchers.
A disallow and noindex both offer ways of controlling search robots. Disallow, though, is about dictating where a robot crawls while a noindex is about what appears in search results. These are different tools and not interchangeable.
Finally, you can also consider removing pages that you have noindexed. If you have decided to noindex a page because you don’t think the page will satisfy the interests and intentions of the people who are visiting your website, then why do you have the page on your website anyway? If the page on your website is incapable of enticing people to click to it from search results, is the page even worth having on your website?
Of course, there are some types of pages you’d want to noindex pages but otherwise want to keep on your site, such as landing pages for ads. The general line of thinking here is that every page on your website needs to deliver some degree of value. If the page can’t deliver value, then instead of noindexing the page to exclude it from search results, the page should be removed from the website altogether.
Before removing the page, though, you need to determine if the page does offer something of value. Sure, the page may not deliver value to people clicking on organic search results. But, perhaps other types of visitors do find value in this page. When evaluating a page’s value, don’t solely focus on the value the page can deliver to an SEO campaign. For example, a tag page on a blog is generally not something you’d want to include in search results. But a blog tag page can be a helpful way to get people engaging on your website. In which case, it would make sense to keep the noindex tag while also keeping the page.
Noindex With Caution
If you currently have a noindex command specified, on the robots.txt file or elsewhere, Google’s change to the robot exclusion protocol standards is a good reason to reconsider if you need that noindex or not. And if you don’t have any noindex commands specified, chances are there is probably some content on your website that does need to be noindexed.
If you want help evaluating where you do or don’t need a noindex, or where other options to control pages might be more beneficial, please contact me today.
You can also learn more about robot control methods in my book, Tech SEO Guide. Now available for only $9.99!