Is Google Finding Every Page On Your Website?
September 21, 2020
How many pages are on your website? Is Google finding them all? Of the pages Google has found, is Google indexing and sending traffic to each page? While we’re at it, too, how many pages are human visitors finding?
These questions seem like a problem that would only affect larger websites. When you have hundreds, thousands, hundreds of thousands, or millions of pages, of course you need to worry about Google finding and indexing everything. While it is certainly critical to answer these questions on larger websites, it is equally important for smaller websites to know the answers to these questions too.
By answering these questions, you may find that some of the pages on your website are actually hidden from view and aren’t linked to from anywhere on the site (so-called orphan pages) making it impossible for Google’s bots or human visitors to find those pages. You may also find that while Google is finding certain pages, they might be deciding to keep those pages out of their index.
So, how do we answer these questions? There are four metrics we need:
- The number of pages that are actually on our site.
- The number of pages Google has crawled.
- The number of pages Google has indexed.
- The number of unique pages that human visitors have accessed within a given time range.
The first number we need is how many pages exist on our website. Essentially, this is the number of pages that could be visited by humans or crawled by Google’s bots. It is our total potential of pages. Once we have this metric, we’ll compare it against how many pages bots and visitors are accessing to see if humans or Google is struggling to find all the pages contained on our website.
For smaller websites that aren’t contained within a content management system, this number is usually quite simple to pull. You can FTP into your website or access the list of pages within your hosting control panel. Elementive’s website is a good example of this and in the example below, you can see the full list of pages within our control panel. Elementive has a total of eight potential pages that could be visited (we’ll ignore the 404 and 500 error pages as we’d rather humans and bots didn’t see those).
This gets a bit more complicated on a larger website or a website managed through a content management system. Take MatthewEdgar.net, for example, which is managed through WordPress. Under Posts or Pages, you can see how many Published posts or pages exist. You can do the same for other types of posts or pages within WordPress that are unique to your particular system. Add up all the Published pages for each type, and you have your total amount of pages on your website.
For even bigger websites, checking your website’s content management system isn’t a feasible option. Perhaps the content is spread out across many different systems. Perhaps your platform allows for the creation of dynamic pages, such as faceted navigation in categories or search-based pages. In these cases, the next best solution is to do a full and complete crawl of the entire website. Using a crawl tool, like Screaming Frog or similar, let the bot explore every nook and cranny to see how many total pages it can find. Note that this may not be every page on your website—perhaps some pages aren’t linked to anywhere or the links are constructed in a way bots can’t access—but it at least will give you a reasonable idea of total pages that exist on your website.
Next up, we need to know how many pages Google crawled. The best way to do this will be by analyzing your website’s log file. Within this, you can see how many pages Google crawled during a given time range. For example, on a specific day, Google crawled around 2000 pages on this website. You could also look at the specific list of pages to see how many unique files Google crawled over a wider time range.
The log file is going to be your most accurate view of Google’s activity because it is data fully within your control. Along with your log file (or instead of the log file if you cannot access your log file), you can get an idea of how many pages Google crawled by reviewing the Crawl Activity Report in Google Search Console.
The big thing to note is that this crawl activity report gives you an idea of how many pages Google has crawled on any particular day but not the total number of unique pages crawled as you can obtain within a log file. So, if you have around 300 or so pages on your website, you’d expect to see something like this crawl activity shown above. But, if you had around 3,000 pages on your website, you’d expect to see a greater amount of crawl activity than what is shown above.
Now that we know how many pages we’ve got and how many Google is finding during their crawl, we now want to review how many pages Google has included in their index.
One of the best places to find this is within Google Search Console’s Coverage reports. In Google Search Console, click on “Coverage” in the sidebar. The Coverage report contains a wealth of information but, for this post, we’re only going to focus on the top-level numbers.
If you add up all pages found within the Error, Valid With Warning, Valid, and Excluded categories, that represents the total number of pages Google found while crawling your website (which is another way to obtain metric #2). However, adding “Valid” + “Valid With Warnings” tells us (approximately) how many pages are indexed in Google’s search results—in this case, 41,000 + 116, or 41,116 pages.
Of course, the Coverage report can sometimes be inaccurate. So, we also want to measure this metric by looking at how many unique pages Google sent traffic to. In Google Analytics, you can use the landing pages report and segment that report by “organic traffic”. In this case, we have 32 unique landing pages that Google has sent traffic to from organic search.
Finally, we have how many unique pages human visitors accessed. Ideally, this should be the easiest of the metrics to pull. In Google Analytics, you can use the All Pages report to find the full list of pages that visitors have access within a certain date range. In this example, we have 38 pages contained on this website.
Bringing the Metrics Together
Throughout this post, I’ve been using examples from different websites. To close this out, let’s look at a single website as an example and discuss what these numbers mean and what the potential problems or opportunities might be. The (rounded) metrics are:
- Total Unique Pages On Website: 3,800
- Unique Pages Crawled by Googlebot: 2,300
- Valid + Valid With Warning Pages: 4,200
- Total Unique Pages Receiving Organic Entrances: 2,600
- Unique Pages Human Accessed: 7,400
These metrics would suggest Google is probably not finding all the pages contained on the website. There are 3,800 pages on this website that Google could find and yet, Google is only crawling 2,300 of those and sending traffic to 2,600 of those pages. Around 1,200 to 1,500 pages are somehow being missed by Google. The next step would be pulling a complete list of pages in each category to see what specific pages are being missed.
Worth noting is that Google has actually indexed far more pages on this website than actually exist. As well, humans are accessing nearly double the number of valid pages that exist. That probably suggests there are some junk pages getting created here possibly from a mistake within the content management system or through a third-party script run amok. The next step would be to find those extra, unwanted pages and make sure they are turned off.
If you need help obtaining or reviewing these metrics or need help ensuring Google is finding every page on your website, please contact me.