Noindex vs. Nofollow vs. Disallow
Portions of the following are adapted from my book, Tech SEO Guide, now available on Amazon.
There’s a common theme of confusion regarding the difference between noindex, nofollow and disallow commands. All three are powerful tools to use to improve a website’s organic search performance, but each has unique situations where they are appropriate to apply. Sadly, many times they are applied incorrectly, which significantly harms a website’s search performance.
Two Search Robot Operations
To understand what noindex, nofollow, and disallow commands do, let’s take a step back to consider what search engine robots do. Search engine’s send around robots to crawl through and understand a website. These robots are complex, but have two basic operations.
- Crawling: Once a robot discovers a website, it crawls through all the pages and files on the website it can find. Limits can be placed on which files and pages a robot can see, and other changes can be made to ensure a robot finds everything it should.
- Indexing: After a crawl, robots then decide what information contained on a particular page can be and should be shown within search results. As part of this, search engine robots will also decide what search results a website’s pages should be included in (if any) and where in those results the page should rank.
Disallow vs. Noindex vs. Nofollow
Disallow: Controlling Crawling
The first method of controlling a search robot is with a disallow command specified on a robots.txt file. When specified, a search robot that follows this command will not crawl the page, file or directory that has been disallowed. However, the disallowed files may still be indexed and appear in search results.
For example, you could specify this on the robots.txt file to discourage the search robot from crawling anything located in /a-secret-directory.
Noindex: Controlling Indexing
The “noindex” command can be specified on a page within the meta robots tag. For example:
<meta name="robots" content="noindex" />
When the meta noindex tag is included on a page, search robots are allowed to crawl the page but are discouraged from indexing the page.
A “Noindex” command can also be stated on the robots.txt file:
The noindex command in the robots.txt file is not supported by Google and other search engines but does occasionally work. It delivers inconsistent results at best, so it is best to use the meta noindex instead.
Nofollow: Controlling Crawling & Telling Google to Ignore the Link
Finally, we have the rel=”nofollow” command. This is an attribute added to a specific <a> tag.
<a href="/no-robots-here" rel="nofollow">Link</a>
This can also be specified in the meta robots tag. When placed in the head of a page, this instructs a search engine robot to not follow any links on the page.
<meta name="robots" content="nofollow" />
This tag serves two purposes. First, it tells the search engine robots not to crawl any links flagged with a nofollow (either at a link level when used in the <a> tag or across the whole page when used in the meta robots tag). Second, a nofollow tag also tells search robots to not count this link when determining where to rank a page (among many other factors, a page’s rank is determined by how many links reference that page).
Using Noindex and Disallow
It is important to be clear on how the Disallow and Noindex commands work together. There are three ways these commands can be combined to affect indexing and crawling.
In Scenario 1, the page with a noindex setting will not be included in a search result. However, a robot may still crawl the page, meaning the robots can access content on the page and follow links on the page.
In Scenario 2, the page will not be crawled but may be indexed and appear in search results. Because the robot did not crawl the page, the robot knows nothing about it. Any content included about this page in search result will be gathered from other sources, like links to the page.
Scenario 3 will operate exactly like Scenario 2 if the noindex was specified within the meta robots tag. This is because when a Disallow is specified, a robot will not crawl to the page. If the robot doesn’t crawl to the page, it will not see the meta tag indicating not to index a page. If a page needs to be set to noindex and disallowed, set the noindex first then after the page is removed from the search index, set the disallow.
When to Use Nofollow?
Generally, robots should be told they can follow all links on a page. Being too aggressive in specifying which links to follow or nofollow can begin to look as if the website is attempting to manipulate a robot’s perception of a website. This is a practice known as page sculpting, where nofollow commands are used to sculpt how signals from one page are passed to another. At best, these attempts to manipulate a robot no longer work. At worst, attempts to manipulate robots with rel nofollow can lead to a penalty.
Rel nofollow should be used for specific instances where signals from one page should not be passed to a page being linked to. The prime example are links on page where a payment was made in exchange for the link. For example, if a blog post includes links to ads, those links should have a rel nofollow attribute.
- Disallow tells a robot not to crawl a page, file, or directory. Noindex tells a robot not to index the page. Nofollow tells a robot not to follow a specific link or all links on a page.
- Use Disallow, Noindex, and Nofollow sparingly and only after carefully considering all implications. If you need help, let’s talk before you implement.
- Disallow, Noindex and Nofollow are optional—robots don’t have to follow any of these commands. If an area of a website should not be publicly accessible or if you want to ensure a part of your website doesn’t end up in a Google search result, require a login instead.
For more information on noindex, nofollow, disallow, and other technical SEO subjects, please refer to the Tech SEO Guide in paperback or Kindle on Amazon. Now available for only $9.99!
If you have other questions about where to use noindex, nofollow, or disallow commands, or if you need help implementing these on your website, please contact me.