How To Manage Parameters
April 25, 2022
Google Search Console’s parameter tool provided an effective way of controlling how Googlebot crawled website parameters. However, this tool has been removed as of April 2022. Plus, even at its best, the tool only worked for Googlebot and didn’t allow for controlling how parameters worked for other robots crawling through the website or determining how visitors would interact with parameters.
In this article, let’s review the best ways of managing parameters on a website, including ways of replacing Google Search Console’s parameter tool. This breaks down into three main steps:
- Find Parameters
- Determine How to Handle Parameters
- Control and Manage Parameters
Part #1: Find Every Parameter on Your Website
To manage parameters on your website, you’ll need to find every parameter on your website—or at least most parameters. This includes parameters generated by your content management system, parameters generated by third-party tools used on your website, and tracking parameters used within advertisements and promotions.
Ideally, you could ask the marketers, developers, and partners at third parties for a list of each parameter used. This isn’t a bad step and can provide a starting point. In reality, though, marketers and developers rarely keep a complete record of each parameter used and, in other cases, can’t keep a complete record since many parameters are dynamically generated. There are three places to look for URLs containing parameters: analytics, log files, and site crawl.
Web analytics tools will tell you which pages with parameters have been visited on the website. In GA4, this can be found in the built-in Pages & Screens report, located under Engagement. Once on this page, change the dimension to “Page path + query string and screen class”. Then search for any URL containing a question mark—question marks are used to start the query string that contains the parameters. This will show the URL with the parameters attached. This file can then be downloaded to Excel or Google Sheets where parameters can be extracted from the URLs (more details on extraction below).
Similarly, we can find URLs with parameters in log files. Using http Log Viewer as an example, we can query for any requests that include parameters as shown in the example below. We could also filter for only requests that include “googlebot” as a user agent to find only the requests from Google’s crawlers that include parameters. Like with the list found in analytics, this list of requested pages can be exported to Excel or Google Sheets where parameters can be extracted from the URL (more details on extraction below).
Finally, you can use a crawl tool to locate parameters contained within internal links on the website. We can see an example of this in Jet Octopus. After running a crawl, view all links found on the crawl. Then, filter for any links that have a link destination containing a question mark. To avoid finding external links that happen to contain parameters, you can also add a condition where link destinations must include your website’s address. Here again, this list can then be exported to Excel or Google Sheets where parameters can be extracted from the URL (more details on extraction below).
Extract Parameters from URLs
Once you have the exported lists, the next step is getting a complete list of parameters contained within those URLs. That sounds like it should be easier to do than it often is. There are a few different tools that can help with this. One is SEOToolsForExcel. This is a paid tool but has a function, UrlProperty, that can extract query strings. For example, if the URL is in A1, you could run UrlProperty(A1,”query”) to get the query string. Then, using Excel’s Dump function and the StringSplit function from SEOTools, you can split the query string by the & and then again by the equal sign to get each individual parameter and parameter value.
Part #2: Determine How to Handle Parameters
Once you have identified the list of parameters, the next step is determining how to handle those parameters for search robots crawling the website. There are a few key questions to address:
- Does the parameter meaningfully change the page’s content?
- Should URLs with this parameter rank in search results?
- Should Googlebot crawl URLs with this parameter?
Does the parameter meaningfully change the page’s content?
Many parameters can change the page’s content, but the question is if that change is meaningful to visitors. The more meaningful the change to the content, the more that parameter is something Googlebot and visitors need to see on the website.
Take as an example a parameter that sorts the page differently, such as ?sort=alpha or ?sort=low-price that sorts a list of products alphabetically or by lowest to highest price. That sort of functionality is helpful for people who are already visiting the page. However, resorting the list doesn’t meaningfully alter the page—it is the same list of products. Somebody searching on Google may want to find the list of products but won’t be so interested in finding that list sorted one way or another. In this example, the parameter is likely not meaningfully changing the content and, therefore, isn’t a parameter Google needs to find or rank.
As another example, there might also be a parameter that filters that list of products. By adding ?filter=blue to the URL, the list of products will only show blue products instead of showing all products. This is a more meaningful change to the page’s content, both for visitors who are on the page and for people searching on Google. People searching on Google would likely be interested in seeing blue products instead of all products. As a result, this parameter is a more meaningful change to the content and would be a parameter Google should find and rank.
With other parameters, though, no changes are made to the website’s content at all. Tracking parameters, like UTM parameters, can be added to URLs that identify a particular traffic source and pass that information along to an analytics program. Because these parameters do not alter the website’s content, they are likely not something Google needs to find or rank from the website.
Should URLs with this parameter rank in search results?
If the parameter alters the page’s content in a meaningful way, the answer is usually clear: let Google rank this page. If the parameter changes the content in a meaningful way, then the page with the parameter included would typically make for a good entry point to your website from organic search results.
For other parameters, though, you do not want Google ranking the URL containing that parameter. Tracking parameters are a prime example. Let’s say a page containing a tracking parameter associated with an ad campaign ranks in organic search results. If somebody clicks on that ranking, your analytics tool will report the incorrect source as the ad campaign instead of as organic search.
In other cases, it wouldn’t be awful if the URL containing the parameter ranked in search results but it wouldn’t be ideal. Sorting parameters are an example. It doesn’t meaningfully change the page’s content, but it wouldn’t exactly be a problem if Google ranked the page sorted one way or another. The problem would be if Google tried to rank multiple versions of the sorted page—ranking the default sort, the alphabetical sort, and the price sort would create duplication problems that could worsen the overall organic performance. In these cases, you don’t need to block the parameter from being indexed and ranked, but you should control which parameter is ranked.
Should Googlebot crawl URLs with this parameter?
Like with ranking, if the parameter meaningfully changes a page’s content, Google needs to be able to crawl that page on the website. However, for parameters that do not meaningfully change the content or parameters that do not change the content at all, the next question to consider is if Google should crawl URLs that contain the parameter. In most cases, it is better to let Googlebot crawl the page instead of blocking crawling. This gives Google more insight into the website and gives them more ability to find internal links to other pages. By crawling a list with a sorting parameter, Googlebot might be able to find internal links to the different items in that list. This could improve the performance of the pages for those individual items. For these types of parameters, it is generally best to allow crawling.
However, by crawling some parameters, Googlebot might create unnecessary problems for the website. Think back to the tracking parameters as an example—Googlebot crawling these parameters could log hits of those parameters in your analytics tool, inflating data and skewing reports. In other cases, especially on large websites, Googlebot might crawl parameters needlessly wasting crawl time and, potentially, shifting focus away from the content you’d rather Google crawl. In those cases, it makes sense to block Googlebot (and other robots) from crawling the page.
Part 3: Control and Manage Parameters
After considering each of these questions, there are three different methods for controlling and managing parameters.
- Allow Crawling & Ranking. If the parameter meaningfully alters the content, Googlebot should crawl, index, and rank the page’s URL with the included parameters.
- Block Ranking. If the parameter does not alter the content, Googlebot should not rank the page’s URL with the parameter in search results.
- Block Crawling. If there is a problem with Googlebot crawling URLs containing a parameter, Googlebot should be blocked from crawling any URLs containing that parameter.
Allow Crawling & Ranking
Pages with parameters that you want Google to crawl and rank in search results should be treated like any other page on the website you want to be crawled or ranked. You want to do everything you can to encourage rankings. For example, can the title, h1, or other text on the page be altered when that parameter is applied? If so, that can help to optimize the URL containing that parameter and increase the chances of that page ranking. As another way to optimize the page, you want to make sure internal links point to the page with the parameters included in the URL and, potentially, list the URL with those parameters in the XML sitemap. Related, you could also build external backlinks to the URL containing that parameter.
There are two primary ways of preventing URLs containing specific parameters from ranking in search results: canonical or noindex.
On a page with a parameter, you can define a canonical tag and have the canonical point to a version of the page without that parameter. For example, https://www.site.com/test?sort=alpha might canonical to https://www.site.com/test instead. This would indicate to Google that you consider these two pages to be duplicates, or near duplicates, of each other. In most cases, Google will index the canonical version of the URL. However, it is important that internal links support the canonical choices as much as possible as Google can start to ignore the defined canonical if internal link signals make Google’s bots think that there is a more appropriate canonical. In that previous example, if /test were selected as the canonical, internal links to /test?sort=alpha should be minimized.
If the canonical is not sufficient, a meta robots noindex tag can be used instead. This noindex tag would be applied on any page containing a specific parameter. This would instruct Google to keep the page with that parameter appended out of the index and not include it rankings.
The easiest way to prevent Googlebot, or other robots, from crawling a page containing a specific page is to use the disallow command on the robots.txt file. This instructs bots to ignore any disallowed page when crawling the website. Internal links should only minimally use the disallowed parameter paths. This will further discourage robots from finding and crawling the pages. Here is an example of a disallow statement blocking crawls of the “test” parameter:
This is optional for Googlebot to follow but generally, Googlebot and others will follow these instructions. If the parameter appended should never be crawled by robots, such as a parameter that exposes a development environment, stronger measures need to be taken. For example, only certain IP addresses or only people who are logged in should be allowed to access pages containing a specific parameter. As well, internal links should never utilize parameters that robots are not allowed to crawl; the best answer is to ensure robots cannot find the URL containing that parameter.
The problem with preventing crawling, whether with the disallow or with stronger measures, is that this only prevents crawling. It does not have any influence over whether the page with the parameter appears in search results. If robots find the URL with that parameter in some other way, the page may still end up in search results. This is because the noindex directive, which instructs Googlebot to keep the page out of rankings, is included on the page itself; if Googlebot can’t crawl the page they can’t see the noindex. You can end up with pages that are indexed even though Googlebot hasn’t crawled the page. Googlebot will state that “No information is available for this page” in the ranking.
This is a part of why it is generally best to allow robots to crawl pages with parameters. If robots can crawl, you can more easily control indexing of the page. If you must prevent crawling, the other way of keeping the pages out of the index is with Google Search Console’s removal tool. You can state the parameter as part of the prefix. For example, removing URLs with the prefix https://www.test.com/some-url/?test= would fix the indexed example above. Here is a video discussing the removal request tool in more detail:
If you have questions about managing parameters on your website, please contact me today. For more help with technical SEO, you can also check out my book, Tech SEO Guide 2.0, which provides a reference to managing parameters and many other aspects of the technical side of SEO.