Is Google Finding Every Page On Your Website?
Last Updated: December 02, 2022
How many pages are on your website? How many of those pages should rank in search results? Is Google finding all of the pages that should rank? If Google found the page, is Google ranking that page correctly?
While these questions are important for larger websites with hundreds, thousands, hundreds of thousands, or millions of pages, it is equally important for smaller websites to know the answers to these questions too. If Google is not finding every page that should rank on a website, then the website will struggle to perform in organic search.
Along with identifying opportunities to improve performance, answering these questions can help you identify other problems on the website, including thin content, duplication, and orphaned pages. You may also identify pages that Google is ranking that shouldn’t be.
To determine how many pages exist on the website, and how Google is handling those pages currently, requires reviewing three metrics:
- The number of pages on the website.
- The number of pages Google has crawled.
- The number of pages Google has indexed.
The first metric to collect is the total number of pages that exist on the website. Essentially, this is the number of pages that could be visited by humans or could be crawled by Googlebot. This is the total potential of pages.
For smaller websites that aren’t contained within a content management system, this number is usually quite simple to pull. You can FTP into your website or access the list of pages within your hosting control panel. Elementive’s website is a good example of this and in the example below, you can see the full list of pages within our control panel. Elementive has a total of eight potential pages that could be visited or crawled (we’ll ignore the 404 and 500 error pages as we’d rather humans and bots didn’t see those).
Finding this number is more complicated on larger websites, especially if many pages are dynamically generated. For example, consider MatthewEdgar.net, which is managed through WordPress. Under Posts or Pages, you can see how many Published posts or pages exist. You can do the same for other types of posts or pages within WordPress that are unique to your particular system. Add up all the Published pages for each type, and you have the total amount of pages for this website.
For even bigger websites, checking your website’s content management system isn’t always a feasible option. Perhaps the content is spread out across many different systems. Perhaps faceted navigation in categories can create thousands of different pages depending on the options selected. In these cases, the next best solution is to do a full and complete crawl of the entire website. A crawl tool, like Botify, JetOctopus, Screaming Frog, or similar will crawl the entire website and indicate how many total pages were found. A crawl tool may not report on every page contained on the website—perhaps some pages aren’t linked to anywhere or the links aren’t constructed in a way the crawl tools can access. However, using the crawl tool will give you a reasonable idea of the total number of pages that exist on your website that could be crawled by Google or could be accessed by visitors.
Next, we need to know how many pages Google has crawled on the website. The best way to do this is to analyze the website’s log file. The log file shows how many pages Google (or other bots) has crawled during a given time range. For example, on a specific day, Google crawled around 2,000 pages on the example website shown in the screenshot below. You could also look at the specific list of pages crawled to see how many unique files Google crawled over a wider time range.
The log file is the most accurate view of Google’s crawling activity because it is data fully within your control and this data is not sampled. Along with log files (or instead of the log file if you cannot access the website’s log file), you can get an idea of how many pages Google crawled by reviewing the Crawl Stats Report in Google Search Console.
The big thing to note is that the Crawl Stats report gives you an idea of how many pages Google has crawled on any particular day but not the total number of unique pages crawled like you can obtain within a log file. However, the more pages that exist on a website, the more you’d expect Googlebot to crawl the website; if you had around 3,000 pages on your website, you’d expect to see a greater amount of crawl activity than a website with only a few hundred pages.
Now that we know how many pages are contained on the website and how many pages Google is crawling, we next want to review the number of pages Google has included in its index.
The easiest place to locate this is within Google Search Console’s Page Indexing Report. In Google Search Console, click on “Pages” in the sidebar. The number of indexed pages is listed at the top of the report. In the example shown below, there are 111 indexed pages.
Indexed pages are not the same as the number of pages that rank on the website because not all indexed pages will rank in search results or receive traffic from organic search results. So, it is also important to measure how many unique pages received traffic from organic search results. In GA4, this can be obtained on the Acquisition report by adding a secondary dimension of “Landing page + query string” and filtering the table to “Organic Search”. In the example below, there are 87 pages that received traffic from organic search.
Bringing the Metrics Together
Throughout this post, I’ve been using examples from different websites. To close this out, let’s look at a single website as an example and discuss what these numbers mean and what the potential problems or opportunities might be. The example metrics are:
- Total Unique Pages On Website: 3,800
- Unique Pages Crawled by Googlebot: 2,300
- Indexed Pages (Google Search Console): 4,200
- Total Unique Pages Receiving Organic Entrances: 2,600
These metrics would suggest Google is probably not finding all the pages contained on the website. There are 3,800 pages on this website that Google could find and yet, Google is only crawling 2,300 of those and sending traffic to 2,600 of those pages. Around 1,200 to 1,500 pages are somehow being missed by Google. The next step would be pulling a complete list of pages in each category to see what specific pages are being missed.
Worth noting is that Google has indexed far more pages on this website than actually exist. That probably suggests there are some low-quality pages getting created here possibly from a mistake within the content management system or through a third-party script. Alternatively, the pages might be valuable pages that were not surfaced in a crawl of the website or in the content management system when determining the number of total unique pages. In that case, those pages might need to be better supported with internal links or references in the XML sitemap. The next step would be to determine what those pages are and determine how to handle them.
If you need help obtaining or reviewing these metrics or need help ensuring Google is finding every page on your website, please contact me.