Google Search Console: Page Indexing Report
By Matthew Edgar · Last Updated: August 30, 2023
After Google crawls a page on your website, Google’s robots need to evaluate that page and decide if that page should appear in search results. This process of evaluation is called indexing. During the indexing process, Google decides how to categorize every page and every file found on a website. This categorization is then used to decide which pages to rank, or don’t rank, in search results.
Plenty of problems can occur within the indexing process that can negatively affect your website’s SEO performance. The best tool available to understand how your website is being indexed by Google is Google Search Console’s Page Indexing report. Let’s review how to access and use this report.
- Accessing the Report
- Why Pages Aren’t Indexed
- Not Indexed Reasons: What Each Category Means and How to Fix
- Discovered – Not Indexed and Crawled – Not Indexed
- Redirects: “Page with Redirect” and “Redirect Error”
- Robot controls: Excluded by noindex tag or Blocked by robots.txt
- Errors: Not Found (404), Soft 404, Other 4xx Issue and Server Error (5xx)
- Duplicate: Google chose different canonical than user, Duplicate without user-selected canonical, Alternate page with proper canonical tag
- Authorization: Blocked due to access forbidden (403) and Blocked due to unauthorized request (401)
- Reviewing Indexed Pages
- Improve Page Appearance
- Validate Fix
- Final Thoughts
Accessing the Report: Not Indexed or Indexed
To access this report, open your website in Google Search Console and then, in the sidebar, click on Pages. The Pages link is located about a third of the way down the sidebar under “Index”. All the pages Google has found on the website will be listed on the Pages report. Pages are grouped into two broad categories: Not Indexed and Indexed.
In this example, Google has found 262 pages they have decided to not index and 119 pages they have decided to index.
Important Disclaimer: Every website has pages that are categorized as Not Indexed. Having Not Indexed pages isn’t in and of itself a problem. Rather, you need to understand why the pages aren’t indexed to determine if there is a problem or not.
Even with the high-level numbers presented on the first page of the Page Indexing report, there are a few immediate questions to ask that can begin suggesting that issues might exist on the website.
- Question #1: Does the count of indexed pages match the number of indexable pages present on the website? The numbers will never match exactly because Google will not crawl every page on your website. However, you want to make sure the numbers are close. Using the website pictured above as an example, Google has found a total of 119 indexable pages. This website has 106 pages in total. Once you include the handful of images or non-HTML files Google could likely be indexing from this website, this number is close enough and likely indicates Google is correctly indexing pages on the website. In other cases, the number of indexed pages is a great deal lower or higher than the total number of pages on the website. Either way, this can be reviewed in more detail in the valid page report, which is discussed below.
- Question #2: Is the ratio of indexed to not indexed pages concerning? In the example screenshot above, we have an indexed:not-indexed ratio of 119:262. That means 31% of the pages Google has found for this website are indexed while 69% of the pages on this website will not be included in the index. This is above average. On average, most websites I’ve seen in this report only have about 17.5% of their pages indexed, though anything from 5-25% is typical. However, some websites have as few as 1% of pages indexed. A lower percentage of indexed pages doesn’t necessarily mean there is a problem, but it could indicate the website has lots of errors, duplicate content, thin content, or other similar issues present.
- Question #3: How have indexed and not indexed pages trended over time? The graph shows how indexed and not indexed pages have trended over the last three months. In the example shown, the website’s page counts have remained steady for both indexed and not indexed pages. However, after major changes are made to a website, you can often see these categories change. For example, a website redevelopment that redirects a lot of URLs would result in not indexed pages increasing. For websites that actively add new content, the count of indexed pages should increase over time.
Why Pages Aren’t Indexed
Scrolling down below the graph, Google lists the various reasons pages have not been included in the index in the “Why pages aren’t indexed” table. This is the most important part of this report and can provide deep insight into why a website may not be ranking or getting traffic from organic search results.
In the screenshot below, the website’s pages have not been included in the index for ten different reasons. The most common reason this website’s pages are not indexed is because of pages that redirect. Clicking on each of these reasons will list the pages not indexed for this reason.
Not Indexed Reasons
There are sixteen distinct reasons Google will list in this report for why pages are not indexed. Google provides a full list of those reasons with details about each in their support documentation. It is important to remember that some of the reasons pages are not indexed are not an issue—those errors simply represent a valid state for certain pages on the website and those pages are correctly not indexed. Other reasons, though, do indicate problems are present on the website and those problems might negatively affect a website’s SEO performance. More details about the different reasons and related problems are discussed below.
It is also important to remember the reasons your website’s pages are not indexed will be specific to your website. The table below recaps the reasons pages were not indexed for about fifty websites. You can see that only some of the reasons were present on all websites. As well, some of these reasons are far more common than others—26.64% of pages were not indexed because they were a page with a redirect but only .00001% of pages weren’t indexed because they were blocked due to an authorization error.
|Percent of Total Issues
|Websites With Issue
|Page with redirect
|Alternate page with proper canonical tag
|Excluded by noindex tag
|Crawled – currently not indexed
|Duplicate, Google chose different canonical than user
|Blocked by robots.txt
|Discovered – currently not indexed
|Not found (404)
|Duplicate without user-selected canonical
|Server error (5xx)
|Blocked due to other 4xx issue
|Blocked due to access forbidden (403)
|Blocked due to unauthorized request (401)
|Blocked by page removal tool
Not Indexed Reasons: What Each Category Means and How to Fix
Discovered – Not Indexed and Crawled – Not Indexed
Two of the most confusing categories to find in the Not Indexed report are “Crawled – currently not indexed” and “Discovered – currently not indexed”. For any pages listed in these categories, it is important to double-check that the page is truly not in the index. The simplest way to do this is to search for the page’s URL on Google using a site: query. For example, search the text in between the quotes on Google to see if this page is indexed: “site:https://www.matthewedgar.net/google-search-console-page-indexing”.
“Discovered – currently not indexed” pages have been detected by Google but Google has not crawled or indexed that page. Google may have seen a link referencing the page but for some reason did not choose to crawl the page.
There can be false positives for pages listed under this reason, so the first step should be to review the website’s log files to confirm the page has not been crawled. If the page is listed in the log file and has been crawled by Googlebot, then it could be a matter of waiting a few days for the page to update in the Page Indexing report. This often happens with new pages added to a website.
If the page has not been crawled and continues to not be crawled, then this could indicate there are problems with how robots crawl the page. As a next step, test if the page can be crawled either by using the Inspect URL tool in Google Search Console or a headless browser. If any crawl-related issues are found, addressing those should resolve this issue.
If no crawl issues are present, and the page remains in the “Discovered” category for several days or even weeks, it can make sense to make bigger changes to the page. For example, add more internal links referencing the page, add external links to the page, update the content on the page to encourage crawls, or, more drastically, change the page’s URL.
“Crawled – currently not indexed” means that Google has found the URL and crawled the URL but has not moved the page into the index. This is one of the hardest categories to diagnose or fix. In most cases, these pages are not indexed due to a quality issue, such as thin content or a low volume of internal links. In other cases, we’ve seen pages end up not categorized for this reason due to a rendering issue. With a rendering issue present, Google won’t see all or most of the content on the page, and, therefore, Google will see no reason to index the page.
Redirects: “Page with Redirect” and “Redirect Error”
“Page with redirect” is one of the most common reasons pages are not indexed. This is typically a non-issue and represents a valid state for pages on a website. For example, pages might redirect elsewhere because URLs have changed and redirects have been added. Seeing the pages listed in the “Page with redirect” report indicates Google has detected that redirect and processed it correctly. However, you should check which pages are listed in this category to confirm those pages should redirect. If the page shouldn’t redirect, then the redirect needs to be removed. If the page should redirect, then no action needs to be taken.
Pages not indexed due to “Redirect error” do represent problems that need to be corrected. The most common redirect-related problems causing Google to not index a page are redirect loops (where a URL redirects back to itself) or excessively long redirect chains (where a redirect hops through multiple URLs before reaching the endpoint). These types of redirect problems can be trickier to resolve. However, this problem typically won’t cause too much disruption to the website’s SEO performance unless the redirect errors are on critical pages or many pages are affected throughout the website.
Learn more about redirect problems and how to fix redirect errors.
Robot Controls: Excluded by Noindex or Blocked by robots.txt
“Excluded by noindex tag” usually indicates pages that have been intentionally set to a noindex status to purposefully exclude the page from search results. Seeing the page not indexed for this reason is Google’s acknowledgment that their robots have seen the noindex and are respecting that tag.
However, the noindex tag might have been accidentally set on pages. This can often happen after a website redesign or redevelopment project where the noindex directive was mistakenly moved from a staging environment to a live environment. Given this, it is important to monitor what URLs are marked “noindex” to confirm no mistakes have been made.
“Blocked by robots.txt” means a disallow statement has been intentionally added to the website’s robots.txt file. Here again, this is often an acknowledgment that Google has seen the directive and is respecting it. However, the robots.txt may have been accidentally updated or the staging website’s robots.txt file might have accidentally been released to a production environment. So, be sure to monitor this category and confirm URLs blocked by the robots.txt file are accurate.
So long as the noindex and disallow statements are intentionally set, no action needs to be taken. Any accidental usage of noindex or disallow can cause massive problems for the website’s SEO performance, including having valid pages removed from search results. Any problems with noindex or disallow need to be addressed immediately.
Errors: Not Found (404), Soft 404, Other 4xx Issue and Server Error (5xx)
Pages that are not indexed because the URL returns a “Not Found (404)” may not be a problem. A 404 response code represents a valid state for a page when that page is removed from the website. If the not found error is valid, no action needs to be taken.
Sometimes, however, a page may have been accidentally removed from the website. If that is the case, the 404 isn’t valid and action needs to be taken to fix the problem—for example, restore the page or redirect the page listed as a 404 somewhere else. Learn more about fixing 404 errors.
Pages that are not indexed because of a “Soft 404” do indicate a problem and action needs to be taken. Soft 404s return an error message for visitors but do not signal the error to robots because the errors have not been properly configured. Learn more about what soft 404s are and how to fix a soft 404.
“Blocked due to other 4xx issue” often includes URLs that are improperly configured 404 errors. For every URL not indexed for this reason, start by checking the page’s response status code to determine what status code is returned by the page. Then review the page itself to see what status code should be returned given the nature of the page’s content. If the page’s content indicates a not found error, then the status code returned should be a 404 but if this is just a normal page on the website, then the page should return a status code of 200. In other cases, the “Other 4xx issue” can be a version of an authentication issue.
Server error (5xx) are often not a valid state for the page. Often, this happens when there is a server outage. During an outage, Google would be unable to crawl the page successfully. Chances are you’ll already know about the server error well in advance of seeing it listed in this report given this report is several days behind in updating. The more troublesome situation is when this error is not expected because there was no known server outage. That could indicate Google was unable to load the website—the more persistent and widespread the issue, the bigger the concern. You can use the URL inspect tool to check if Google can successfully load the page or you can review log files to monitor Google’s crawl activity. If Google cannot load the page successfully, action should be taken immediately to fix this problem.
Duplicate: Google chose different canonical than user, Duplicate without user-selected canonical, Alternate page with proper canonical tag
Three of the reasons for a page not being indexed relate to duplicate content: Alternate page with proper canonical tag; Duplicate, Google chose different canonical than user; Duplicate without user-selected canonical. While each is distinct, these issues indicate problems with duplicate content likely being present on the website. Learn more about working with canonical tags and resolving duplication.
“Duplicate, Google chose different canonical than user” is the main category to focus on as it often represents a bigger duplicate content problem. With these pages, Google is ignoring the canonical tag stated on the website. This can happen if canonical tags are not used correctly on the website to control duplication. For example, let’s say https://www.site.com/blue-widgets contains essentially the same content as https://www.site.com/blue-products-sold and that both pages contain a self-referencing canonical. With these example pages, Google would likely detect the duplication and select one page as the canonical version. If you have pages not indexed because of this reason, you need to step through each page to determine if Google has correctly detected duplication on your website. If you disagree with Google, work to differentiate both pages to make it clearer why both pages should be indexed. Or, if the duplicate pages are not needed, you could delete one of the duplicate pages.
“Duplicate without user-selected canonical” almost always indicates a problem exists on the website. In these cases, Google has no guidance from the website whatsoever about which pages to index because no canonical tags are present on the website. If Google detects duplication, Google’s robots will be forced to make a decision on behalf of the website about which duplicate version of the page to use. Even if Google is making the correct choice about which page to index, canonical tags should be added to the website declaring a canonical version for each set of duplicated pages. You want to do what you can to avoid Google making decisions about how to index or not index pages on your website.
“Alternate page with proper canonical tag” typically does not require any action. These pages represent alternative URL patterns Google found but chose to ignore, instead respecting the canonical tag as stated on the website. For example, one of the pages not indexed on my website due to this reason is https://www.matthewedgar.net/breadcrumbs-and-breadcrumb-schema/?r=q. This page contains a canonical to https://www.matthewedgar.net/breadcrumbs-and-breadcrumb-schema/ (without the parameter). Because this is a tracking parameter and because that parameter does not alter the page’s content, Google is handling this URL correctly. As you review pages not indexed because of this reason, you may occasionally find URLs with incorrect canonical tags stated on the website, which would need to be fixed to avoid any negative impacts on the website’s SEO performance.
Authorization: Blocked due to access forbidden (403) and Blocked due to unauthorized request (401)
“Blocked due to access forbidden (403)” and “Blocked due to unauthorized request (401)” indicate Google wasn’t allowed to crawl a particular URL. A 403 status code indicates the server is forbidding access to a given file because of improper authentication. A 401 status is similar but indicates the server is refusing access to a given file because the requesting user, Googlebot in this case, did not provide proper credentials. Note that pages not indexed because of “Other 4xx issues” can be improperly configured 401 or 403 pages.
A page returning a 401 or 403 status code does not typically represent a problem unless the pages returning these status codes happen to be valid pages on the website. Instead, these are often pages that require a login or other form of authentication—such as an admin page on the website or a link to a staging website. In other words, you want Google to see the 401 or 403 status because you do not want Google crawling or indexing these pages.
While there is rarely something to fix, seeing pages not indexed for these does lead to the question of how Google found these pages. Any page that requires authentication should likely not be linked to from anywhere on your website or linked to from external websites. As a result, it can be useful to go through these pages to see if any links exist exposing these pages to Google and, if any pages do exist, determine if those links should be removed.
Reviewing Indexed Pages
Along with reviewing pages that are not indexed, the Page Indexing report also lists the pages that are indexed. Immediately under the graph on the main Page Indexing report, there is a button that says “View data about indexed pages.” Click this button to see which pages are indexed.
On the detail page, you can see an example of which pages have been indexed and the last date each page was crawled. To confirm there are no issues present, review the pages listed in this example data and check if any pages are indexed that shouldn’t be. For example, there might be pages listed as indexed even though those pages previously contained a noindex tag—seeing the pages listed here might indicate that the noindex tag was removed accidentally during a recent website update.
Improve Page Appearance
Below the “Why aren’t pages indexed” table on the Page Indexing report, you may also see a table labeled “Improve page appearance”. This table will only show if Google has detected non-critical issues that could affect indexing on the website. If you do not see this table for your website, then none of these issues have been detected by Google.
There are only two warnings that Google will currently show in this report:
Indexed, though blocked by robots.txt. Pages with this categorization may rank in search results but have no information about the page included. If you want these pages to rank in search results, then it makes sense to remove the block on the robots.txt file so that robots can crawl and rank the page.
Alternatively, you may review the pages blocked and decide the pages should not rank at all in search results. In that case, the follow-up action is to remove the disallow from the robots.txt and noindex the page instead. Removing the disallow will allow robots to crawl the page so they can detect the noindex tag. Once robots have detected the noindex tag, robots will move the page to a not indexed status and the page will no longer be indexed or ranked in search results.
Learn more about disallow and noindex statements.
As you make changes to fix the issues listed in the “Why pages aren’t indexed” or “Improve page appearance” tables, you can validate the fix to alert Google that a change has been made. Google’s robots will then reevaluate the affected pages and update the Page Indexing report accordingly.
To validate the fix, begin by clicking on one of the items presented on the table. At the top of the page, you will see a button to “Validate Fix”. Here is an example of this on a “Soft 404” detail page.
It is very important to only click the “Validate Fix” button once the issue has been fixed. Once you click the button, the page will update indicating the validation has started. You will receive an email notice that this validation has begun.
Once the validation process is complete, you will receive an additional email notice indicating if the validation was successful. If the issues were not successful, then additional work is needed to fix the problem. A failure could also mean new issues were detected.
It is important to regularly review the Page Indexing report in Google Search Console to see which pages are not indexed and understand why those pages are not indexed. For more active websites, this report should be reviewed weekly. For less active websites, this report should be reviewed monthly. Given this report only shows the last three months of activity, though, every website should review this report at least quarterly. If you need help reviewing this report or knowing what actions to take given what the report says, please contact me.