Google Search Console: Page Indexing Report
Last Updated: October 04, 2022
After Google crawls a page on your website, Google’s robots need to evaluate that page and decide if that page should appear in search results. This process of evaluation is called indexing. During the indexing process, Google decides how to categorize every page and every file found on a website. This categorization is then used to decide which pages to rank, or don’t rank, in search results.
Plenty of problems can occur within the indexing process that can negatively affect your website’s SEO performance. The best tool available to understand how your website is being indexed by Google is Google Search Console’s Page Indexing report. Let’s review how to access and use this report.
- Accessing the Report
- Why Pages Aren’t Indexed
- Reviewing Indexed Pages
- Improve Page Appearance
- Validate Fix
- Final Thoughts
Accessing the Report: Not Indexed or Indexed
To access this report, open your website in Google Search Console and then, in the sidebar, click on Pages. The Pages link is located about a third of the way down the sidebar under “Index”. All the pages Google has found on the website will be listed on the Pages report. Pages are grouped into two broad categories: Not Indexed and Indexed.
Important Disclaimer: Every website has pages that are categorized as Not Indexed. Having Not Indexed pages isn’t in and of itself a problem. Rather, you need to understand why the pages aren’t indexed to determine if there is a problem or not.
In the example shown below, Google has found 262 pages they have decided to not index and 119 pages they have decided to index. The graph shows how these numbers have trended over the last three months—in this example, the website’s page counts have remained steady.
Even though we’ve only looked at the high-level numbers, there are a few immediate questions to ask that can begin suggesting that issues might exist on the website.
One of the first questions to ask when reviewing the high-level numbers is if the count of indexed pages matches the number of indexable pages present on the website. While the numbers will never match exactly because Google will not crawl every page on your website, you want to make sure the numbers are close. Using the pictured website as an example, Google has found a total of 119 indexable pages. This website has 106 pages in total. Once you include the handful of images or non-HTML files Google could likely be indexing from this website, this number is close enough and likely indicates Google is correctly indexing pages on the website. In other cases, the number of indexed pages is a great deal lower or higher than the total number of pages on the website. Either way, this can be reviewed in more detail in the valid page report, which is discussed below.
Another question raised by these high-level numbers is if the ratio of indexed to not indexed pages indicates any issues. In the example screenshot above, we have an indexed:not-indexed ratio of 119:262. That means 31% of the pages Google has found for this website are indexed while 69% of the pages on this website will not be included in the index. This is above average. On average, most websites I’ve seen in this report only have about 17.5% of their pages indexed, though anything from 5-25% is typical. However, some websites have as few as 1% of pages indexed. A lower percentage of indexed pages doesn’t necessarily mean there is a problem, but it could indicate the website has lots of errors, duplicate content, thin content, or other similar issues present.
Why Pages Aren’t Indexed
Scrolling down below the graph, Google lists the various reasons pages have not been included in the index in the “Why pages aren’t indexed” table. This is the most important part of this report and can provide deep insight into why a website may not be ranking or getting traffic from organic search results.
In the screenshot below, the website’s pages have not been included in the index for ten different reasons. The most common reason this website’s pages are not indexed is because of pages that redirect. Clicking on each of these reasons will list the pages not indexed for this reason.
Not Indexed Reasons
There are sixteen distinct reasons Google will list in this report for why pages are not indexed. Google provides a full list of those reasons with details about each in their support documentation. It is important to remember that some of the reasons pages are not indexed are not an issue—those errors simply represent a valid state for certain pages on the website and those pages are correctly not indexed. Other reasons, though, will negatively affect a website’s SEO performance and do indicate problems are present on the website. More details about the different reasons and related problems are discussed below.
It is also important to remember the reasons your website’s pages are not indexed will be specific to your website. The table below recaps the reasons pages were not indexed for about fifty websites. You can see that only some of the reasons were present on all websites. As well, some of these reasons are far more common than others—26.64% of pages were not indexed because they were a page with a redirect but only .00001% of pages weren’t indexed because they were blocked due to an authorization error.
|Reason||Percent of Total Issues||Websites With Issue|
|Page with redirect||26.64%||100%|
|Alternate page with proper canonical tag||22.59%||98%|
|Excluded by noindex tag||17.26%||93%|
|Crawled – currently not indexed||16.39%||100%|
|Duplicate, Google chose different canonical than user||7.45%||84%|
|Blocked by robots.txt||4.70%||73%|
|Discovered – currently not indexed||3.13%||93%|
|Not found (404)||1.18%||100%|
|Duplicate without user-selected canonical||0.57%||89%|
|Server error (5xx)||0.002%||76%|
|Blocked due to other 4xx issue||0.002%||58%|
|Blocked due to access forbidden (403)||0.001%||69%|
|Blocked due to unauthorized request (401)||0.00001%||20%|
Not Indexed Because of Redirects
“Page with redirect” is one of the most common reasons pages are not indexed. This is typically a non-issue and represents a valid state for pages on a website. For example, pages might redirect elsewhere because the website’s structure has changed and redirects have been added. Seeing the pages listed in the “Page with redirect” report indicates Google has detected that redirect. However, it is a good idea to check which pages are listed in this category to confirm those pages should redirect. If the page shouldn’t redirect, then the redirect needs to be removed—but if the page should redirect, then no action needs to be taken.
Pages not indexed due to “Redirect error” do represent problems that need to be corrected. The most common redirect-related problems causing Google to not index a page are redirect loops or excessively long redirect chains. These types of redirect problems can be trickier to resolve. However, this problem typically won’t cause too much disruption to the website’s SEO performance unless many pages are affected throughout the website.
Learn more about redirect problems and how to address those issues.
Excluded by Noindex or Blocked by robots.txt
Pages not indexed for these reasons usually do not indicate problems that need to be addressed. For example, “Excluded by noindex tag” usually indicates pages that have been intentionally set to a noindex status to purposefully exclude the page from search results. Seeing the page not indexed for this reason is Google’s acknowledgment that they have seen the noindex and are respecting that tag. In other cases, the noindex tag might have been accidentally set on pages during a website update. The same is true of “Blocked by robots.txt”, where a disallow has most often been intentionally set but can sometimes be accidentally released to a production environment.
So long as the noindex and disallow statements are intentionally set, no action needs to be taken. Any accidental usage of noindex or disallow can cause massive problems for the website’s SEO performance, including having valid pages removed from search results. Any problems with noindex or disallow need to be addressed immediately.
Learn more about noindex and robots.txt disallows and how to address related problems.
Not Indexed Because of 404s, Soft 404s and Server Errors
Pages that are not indexed because the URL returns a 404 may be valid because the page was intentionally removed from the website. In that case, no action needs to be taken. If the 404 isn’t valid, then action needs to be taken to fix the problem—for example, redirect the page listed as a 404 somewhere else. Related to this, a Soft 404 indicates errors have not been properly configured and need to be fixed. Learn more about fixing 404 errors and Soft 404s.
In many cases, pages not indexed because of the “Other 4xx issue” can be improperly configured 404 errors. Check the page’s response status code to determine what status code is actually returned by the page. Then review the page itself to see what status code should be returned given the nature of the page’s content. If the page’s content indicates a not found error, then the status code returned should be a 404 but if this is just a normal page on the website, then the page should return a status code of 200. In other cases, the “Other 4xx issue” can be a version of an authentication issue.
Along with 404 errors, pages can be kept out of the index due to server errors. Server errors return a 5xx response code. This error could be due to a server outage. If it is, then you’ll already know about the server error well in advance of seeing it listed in this report given this report is several days behind in updating. The more troublesome situation is when this error is not expected because there was no known server outage. That could indicate Google was unable to load the website—the more persistent and widespread the issue, the bigger concern. You can use the URL inspect tool to check if Google can successfully load the page. If Google cannot load the page successfully, action should be taken immediately to fix this problem.
Not Indexed Because of Duplication
Three of the reasons for a page not being indexed relate to duplicate content: Alternate page with proper canonical tag; Duplicate, Google chose different canonical than user; Duplicate without user-selected canonical. While each is distinct, these issues indicate problems with duplicate content likely being present on the website. Learn more about working with canonical tags and resolving duplication.
Pages not indexed with the reason “Alternate page with proper canonical tag” typically does not require any action. These pages represent alternative URL patterns Google found but has chosen to ignore, instead respecting the canonical tag as stated on the website. For example, one of the pages not indexed on my website due to this reason is https://www.matthewedgar.net/breadcrumbs-and-breadcrumb-schema/?r=q. This page contains a canonical to https://www.matthewedgar.net/breadcrumbs-and-breadcrumb-schema/ (without the parameter). Because this is a tracking parameter and because that parameter does not alter the page’s content, Google is handling this correctly. As you review pages not indexed because of this reason, you may occasionally find URLs with incorrect canonical tags stated on the website, which would need to be fixed to avoid any negative impacts on the website’s SEO performance.
The bigger duplicate issues are present for pages not indexed due to the “Duplicate, Google chose different canonical than user” reason. With these pages, Google has actively chosen to ignore the canonical tag stated on the website. This can happen if canonical tags are not used correctly on the website to control duplication. For example, let’s say https://www.site.com/blue-widgets contains essentially the same content as https://www.site.com/blue-products-sold and that both pages contain a self-referencing canonical. With these example pages, Google would likely detect the duplication and select one page as the canonical version. If you have pages not indexed because of this reason, you need to step through each page to determine if Google has selected the correct canonical URL. If not, you need to update the canonical tags on your website—in this example, that would mean setting the canonical on https://www.site.com/blue-widgets to https://www.site.com/blue-products-sold (or vice versa).
A problem is almost always indicated with the final duplication category: Duplicate without user-selected canonical. In these cases, Google has no guidance from the website whatsoever about which page to index and is forced to make a decision on behalf of the website. Even if Google is making the correct choice about which page to index, canonical tags should be added to the website declaring a canonical version for each set of duplicated pages.
Discovered or Crawled but Not Indexed
Two of the most confusing categories to find in the Not Indexed report are “Crawled – currently not indexed” and “Discovered – currently not indexed”. For any pages listed in these categories, it is important to double-check the page is truly not in the index. The simplest way to do this is to search for the page’s URL on Google using a site: query. For example, search the text in between the quotes on Google to see if this page is indexed: “site:https://www.matthewedgar.net/google-search-console-page-indexing”.
Pages that are discovered but not indexed, have been detected by Google but Google has so far decided to take no further action with that page. Google may have seen a link referencing the page but for some reason did not choose to crawl the page.
There can be false positives for pages listed under this reason, so the first step should be to review the website’s log files to confirm the page has not been crawled. If the page is listed in the log file and has been crawled by Googlebot, then it could be a matter of waiting a few days for the page to update in the Page Indexing report. This often happens with new pages added to a website.
If the page has not been crawled and continues to not be crawled, then this could indicate there are problems with how robots crawl the page. As a next step, test if the page can be crawled either by using the Inspect URL tool in Google Search Console or a headless browser. If any crawl-related issues are found, addressing those should resolve this issue.
If no crawl issues are present, and the page remains in the “Discovered” category for several days or even weeks, it can make sense to make bigger changes to the page. For example, add more internal links referencing the page, add external links to the page, update the content on the page to encourage crawls, or, more drastically, change the page’s URL.
It is less obvious why pages are crawled but not indexed. In most cases, these pages are not indexed due to a quality issue, such as thin content or a low volume of internal links. In other cases, we’ve seen pages end up not categorized for this reason due to a rendering issue. With a rendering issue present, Google won’t see all or most of the content on the page, and, therefore, Google will see no reason to index the page.
Not Indexed Because of 403 and 401
A 403 status indicates the server is forbidding access to a given file because of improper authentication. A 401 status is similar but indicates the server is refusing access to a given file because the requesting user, Googlebot in this case, did not provide proper credentials. Note that pages not indexed because of “Other 4xx issues” can be improperly configured 401 or 403 pages.
A page returning a 401 or 403 does not typically represent a problem unless the pages returning these status codes happen to be valid pages on the website. Instead, these are often pages that require a login or other form of authentication—such as an admin page on the website or a link to a staging website.
While there is rarely something to fix, seeing pages not indexed for these does lead to the question of how Google found these pages. Any page that requires authentication should likely not be linked to from anywhere on your website or linked to from external websites. As a result, it can be useful to go through these pages to see if any links exist exposing these pages to Google and, if any pages do exist, determine if those links should be removed.
Reviewing Indexed Pages
Along with reviewing pages that are not indexed, the Page Indexing report also lists the pages that are indexed. Immediately under the graph on the main Page Indexing report, there is a button that says “View data about indexed pages.” Click this button to see which pages are indexed.
On the detail page, you can see an example of which pages have been indexed and the last date each page was crawled. To confirm there are no issues present, review the pages listed in this example data and check if any pages are indexed that shouldn’t be. For example, there might be pages listed as indexed even though those pages previously contained a noindex tag—seeing the pages listed here might indicate that the noindex tag was removed accidentally during a recent website update.
Improve Page Appearance
Below the “Why aren’t pages indexed” table on the Page Indexing report, you may also see a table labeled “Improve page appearance”. This table will only show if Google has detected non-critical issues that could affect indexing on the website. If you do not see this table for your website, then none of these issues have been detected by Google.
There are only two warnings that Google will currently show in this report:
Indexed, though blocked by robots.txt. Pages with this categorization may rank in search results but have no information about the page included. If you want these pages to rank in search results, then it makes sense to remove the block on the robots.txt file so that robots can crawl and rank the page.
Alternatively, you may review the pages blocked and decide the pages should not rank at all in search results. In that case, the follow-up action is to remove the disallow from the robots.txt and noindex the page instead. Removing the disallow will allow robots to crawl the page so they can detect the noindex tag. Once robots have detected the noindex tag, robots will move the page to a not indexed status and the page will no longer be indexed or ranked in search results.
Learn more about disallow and noindex statements.
As you make changes to fix the issues listed in the “Why pages aren’t indexed” or “Improve page appearance” tables, you can validate the fix to alert Google that a change has been made. Google’s robots will then reevaluate the affected pages and update the Page Indexing report accordingly.
To validate the fix, begin by clicking on one of the items presented on the table. At the top of the page, you will see a button to “Validate Fix”. Here is an example of this on a “Soft 404” detail page.
It is very important to only click the “Validate Fix” button once the issue has been fixed. Once you click the button, the page will update indicating the validation has started. You will receive an email notice that this validation has begun.
Once the validation process is complete, you will receive an additional email notice indicating if the validation was successful. If the issues were not successful, then additional work is needed to fix the problem. A failure could also mean new issues were detected.
It is important to regularly review the Page Indexing report in Google Search Console to see which pages are not indexed and understand why those pages are not indexed. For more active websites, this report should be reviewed weekly. For less active websites, this report should be reviewed monthly. Given this report only shows the last three months of activity, though, every website should review this report at least quarterly. If you need help reviewing this report or knowing what actions to take given what the report says, please contact me.