Skip to content

Noindex vs. Nofollow vs. Disallow

By Matthew Edgar · Last Updated: March 22, 2024

There are three main tools you can use to control how search engine robots crawl and index your website: noindex, nofollow and disallow commands. If used appropriately, these tools can improve SEO performance. Applied incorrectly, these tools can significantly harm a website’s search performance. In this article, let’s review when you should and shouldn’t use noindex, nofollow and disallow.

Jump Ahead

Crawling and Indexing

Before a website’s pages can rank in search results, search engine robots need to find all of those pages on the website and then process all of those pages to decide what to rank in search results. The process of finding all of the pages is referred to as crawling and processing those pages is referred to as indexing.

Crawling begins with robots discovering all of the URLs to pages contained on a website. Robots primarily discover these URLs from links contained on the website (internal links) or links to these URLs from external websites (backlinks). Once a robot discovers a URL to a page, it fetches content from that page (title, page text, images, and so on) along with other data about the page (like the date the page was last updated). Limits can be placed on which files and pages a robot can crawl.

Indexing happens after a crawl. This is where robots take all the information gathered during that crawl and begin evaluating all of the pages found. Robots will decide if the content is helpful and authoritative during indexing. Also, robots will determine what topics the page is related to and how the page compares to other related pages that also discuss those topics. As part of this, search engine robots will decide what search results a website’s pages should be included in (if any) and where the page should rank within those results.

Disallow vs. Noindex vs. Nofollow

Disallow: Controlling Crawling

A disallow command is specified on a robots.txt file and controls crawling.

The “robots.txt” file is a plain text file placed in the root directory of a website. It provides instructions for robots telling them which directories you would prefer they not crawl.

Amazon.com’s robots.txt file

If a disallow is specified on the robots.txt, a search robot that respects robots.txt files will not crawl the page, file, or directory that has been disallowed. For example, Amazon.com’s robots.txt file contains this disallow statement:

Disallow: /exec/obidos/account-access-login

When crawling Amazon’s website, Googlebot will not crawl any files contained in the /exec/obidos/account-access-login directory, even if links are found pointing to pages in that directory.

A disallow can also be specified for a specific robot. For example, this robots.txt file entry instructs Googlebot to avoid the /my-secret-page/ directory. However, Bingbot could still crawl this directory.

user-agent: googlebot
Disallow: /my-secret-page/

The most important thing to remember is that robots.txt files only control crawling. A disallow does not control indexing.

As a result, any disallowed file may still be indexed and appear in search results. For example, Google and Bing may find a link to a disallowed page on your website or elsewhere on the web. They couldn’t crawl the page to see the contents of the page, but they would know the page exists and could possibly show the page in Google’s index. When this happens, you will see a search result similar to this:

Disallow files may still rank in search results

Generally, it is best to only disallow files that will cause problems if crawled. For example, Googlebot may crawl video files hosted on your website too aggressively and that could cause your server to crash. In that case, a disallow would make sense to keep Google from crawling those video problems and causing issues with your website’s server.

If there are pages Google should never crawl, you should use a stronger method to deter crawls. For example, you should password-protect your development or staging environments. Google cannot enter a password, so cannot crawl those parts of the website.

One set of files you want to make sure never to disallow are JavaScript, CSS, or image files. These files control how the page looks and Google relies on these files to understand and evaluate a page.

Meta Robots Nofollow: Controlling Crawling

There are two different nofollow statements. The nofollow command that controls crawling is the meta robots nofollow. This nofollow is applied at a page level by specifying it in a meta robots tag in the page’s <head>.

<html>
<head>
...
<meta name="robots" content="nofollow" />
</head>
<body>
...
<a href="/some-link">This link will not be crawled.</a>
</body>
</html>

This can also be specified for Googlebot or Bingbot specifically. This nofollow meta tag would only restrict Bingbot from following links on the page during a crawl.

<meta name="bingbot" content="nofollow" />

When placed in the <head> of a web page, the meta nofollow instructs a search engine robot to not crawl any links on the page. Robots will still be able to fetch content and information from the page itself.

The important thing to remember is that the meta robots nofollow only applies to links on that page. Let’s say the /about-us page on a website contains a meta robots nofollow and links to the page /staff/bio/jane-doe. If /staff/bio/jane-doe is only linked to from the /about-us page, then robots will never crawl to that page. However, if my blog posts also link to /staff/bio/jane-doe, robots would crawl to the /staff/bio/jane-doe.

If you do not want robots to crawl to the page at all, then the robots.txt disallow is the better method of controlling crawling.

Rel Nofollow & Rel Qualifiers: Explaining the Nature of the Link

The other nofollow is the rel=”nofollow” command. This nofollow qualifies, or explains, the nature of the link to robots. It does not influence how robots crawl the link itself.

Traditionally, rel=”nofollow” was used to specify any links that were sponsored or had a monetary relationship, including affiliate links. Google has since introduced other types of link qualifiers: rel=”sponsored” and rel=”ugc”.

  • rel=”sponsored” qualifier is for any paid link
  • rel=”ugc” is for any link contained within user-generated content
  • rel=”nofollow” is for any other link you’d rather Google’s bots not associate with your website

These rel commands are specified on a link level with a “rel” attribute added to a specific <a> tag. For example, this link is qualified with a nofollow and this link to /no-robots-here page wouldn’t be associated from your website.

<a href="https://www.someotherwebsite.com/no-robots-here" rel="nofollow">Link</a>

Link qualifiers can be combined to ensure the clearest signal is sent to robots crawling the website.

<a href="https://www.someotherwebsite.com/no-robots-here" rel="nofollow sponsored">Link</a>

Noindex: Controlling Indexing

The “noindex” command can be specified on a page within the meta robots tag. When the meta noindex tag is included on a page, search robots are allowed to crawl the page and can crawl links contained on the page but are discouraged from indexing the page. With a noindex specified, the page won’t be included in search result rankings.

Here is an example meta robots noindex tag that would be included in the page’s <head>:

<meta name="robots" content="noindex" />

A couple of notes:

  • You previously could specify a noindex on the robots.txt file. However, this is no longer supported by Google (and likely never was). With that official lack of support, the only way of specifying noindex is on a page level.
  • If you can’t add a meta tag to the page’s <head>, you can also use X-Robots in the HTTP header. This can be helpful for preventing non-HTML content, such as PDFs or some images from being included in search results.

Using Noindex and Disallow Together

One of the trickiest technical SEO concepts to understand is how the Disallow and Noindex commands work together. There are three ways these commands can be combined to affect indexing and crawling.

 DisallowNoindex
Scenario 1 X
Scenario 2X 
Scenario 3XX

In Scenario 1, the page with a noindex setting will not be included in a search result. However, a robot may still crawl the page, meaning the robots can access content on the page and follow links on the page.

In Scenario 2, the page will not be crawled but may be indexed and appear in search results. Because the robot did not crawl the page, the robot has no information or content from the page.

Scenario 3 will operate exactly like Scenario 2. This is because when a Disallow is specified in the robots.txt file, a robot will not crawl the page. If the robot doesn’t crawl the page, it will not see the meta tag indicating not to index a page.

When To Use Noindex Tag

Generally, it should be left up to robots to decide what should or should not be indexed and appear in search results. If a page on the website shouldn’t appear in search results, it often shouldn’t even be included on the website.

For example, there can be low-quality, or thin, content pages on your website. While the best answer is usually to remove these pages from the website, in some cases, you need to keep these lower-quality pages because they help certain visitors. Even if the page helps certain visitors, its low-quality nature means you do not want people to find the page in search results because it would be a bad entry point for your website. A noindex allows you to keep the lower-quality pages out of search results while still keeping the page on your website for visitors.

Another common example is landing pages for ad campaigns. Ad landing pages are only intended for traffic coming from a specific ad and you would not want organic search traffic arriving on that page. Using a meta robots noindex on these pages would prevent search engines from ranking the landing page.

When To Use Meta Robots Nofollow

Generally, robots should be told they can follow all links on a page. Being too aggressive in specifying which links to follow or nofollow can begin to look as if the website is attempting to manipulate a robot’s perception of a website.

This is a practice known as page sculpting, where nofollow commands are used to sculpt how signals from one page are passed to another. At best, these attempts to manipulate a robot no longer work. At worst, attempts to manipulate robots with rel nofollow can lead to a penalty.

As a result, there are very few use cases for meta robots nofollow on websites. More often than not, meta robots nofollow is something to look for in SEO audits as a sign of over-optimization on the website.

When To Use Rel Qualifiers On Links

Rel=”nofollow”, rel=”sponsored”, or rel=”ugc” should be used for specific instances where you need to clearly signal the nature of the link. The prime example are links on a page where a payment was made in exchange for the link. For example, if an article includes links to ads, those links should have a rel=”sponsored” attribute. Or, if a blog post allows comments, any links provided in those comments should have a rel=”ugc” attribute.

Disallow, Noindex, or Nofollow Are Optional

Disallow, noindex, and nofollow are optional—robots don’t have to follow any of these commands. The word command is a bit of an overstatement. At best, these are recommendations and robots can ignore any or all of the commands. Often, robots ignoring a noindex or disallow is a sign of a bigger problem about the website’s structure. In these situations, you want to research how robots are crawling the website and why robots are choosing to ignore the commands you’ve provided.

Because these commands are optional, you want to not rely on them for any critical aspects of your website. Restricting certain parts of a website by password protection or other means of authentication will do a better job of blocking robots from crawling or indexing a page.

Summarizing Robot Commands

The most important thing to remember is there are two operations: crawling and indexing. We can control or influence both of these using different commands.

To sum up, those commands are:

  • Disallow tells a robot not to crawl a page, file, or directory. It should be used to limit problematic crawling.
  • Noindex tells a robot not to index the page. It should be used to keep pages out of search results or to help with low-quality content issues.
  • Meta nofollow tells a robot not to follow a specific link or all links on a page. This should typically not be used.
  • Rel=”nofollow” (or rel=”sponsored” or rel=”ugc”) qualifies (or explains) the nature of the link. It should be used for links added by uses or likes with a monetary relationship (like affiliate links).

Use disallow, noindex, meta robots nofollow, and rel qualifiers sparingly and only after carefully considering all implications about how their use will affect your website’s SEO performance. As you use these commands, make sure you aren’t blocking robots from seeing important parts of your website that robots need to see to understand the page—such as JavaScript, CSS, or image files. When in doubt, you should allow robots to crawl and index the page.

Testing Robot Commands

If you decide to use disallow, noindex, or nofollow, you want to test to make sure robots understand the commands correctly. While you can use crawl tools to help with this, a simpler method for testing is within Google Search Console.

Testing Robots.txt

In Google Search Console, you can check your current robots.txt file to see what, if any, pages are currently listed as pages you do not want Google to access. This isn’t currently available within the navigation in Google Search Console but is available as a legacy tool (access directly here).

On this page, you will see your website’s current robots.txt file. Below the robots.txt file, you can enter URLs from your website and test to see if Google would be prevented from crawling this page due to the robots.txt file. In this example, the wp-admin directory is blocked from crawling but robots are allowed to crawl all other URLs.

Google Search Console – robots.txt Tester

 Testing Crawlability and Indexability

The other method of testing if robots can crawl or index is the URL Inspector tool in Google Search Console. In Google Search Console, enter a URL in the bar at the top of the screen.

URL tester in GSC

After the results load, within the coverage report, you can see if crawling and indexing are allowed. In this example, both are allowed—which is the intended response. If, however, I had specified a noindex or disallow for this page, the crawl or indexed allowed answers should be a “no”.

Get Help

If you have questions about robot commands, contact me to talk through questions. For more information on noindex, nofollow, disallow, and other technical SEO subjects, please check out my book, Tech SEO Guide.

Resources

You may also like

How to Check HTTP Response Status Codes

Every page on every website returns an HTTP response status code. How do you check the status code for your website’s pages? What tools can you use to test status codes? What do the status codes mean?

How To Fix 404 Errors On Your Website

How do you find the 404 errors on your website? Once found, how do you fix the 404 errors? What tools can help? Find out in this in-depth 404 guide!

Unhelpful and Low-Quality Content

How do you address unhelpful, duplicate, outdated, and other forms of low-quality content?