Managing URL Parameters & Query Strings: SEO Best Practices
By Matthew Edgar · Last Updated: April 21, 2023
URL parameters, also known as query strings, are part of a webpage’s URL. Parameters can provide additional information about the specific content being requested. For example, a parameter may be added to a URL to indicate the page’s content is sorted or filtered.
Parameters follow a question mark (?) in the URL and are composed of a key-value pair separated by an equal sign (=). Multiple parameters can be added by separating them with an ampersand (&). For example, in the URL https://example.com/page?param1=value1¶m2=value2, param1 and param2 are URL parameters with values value1 and value2 respectively. The entire text after the question mark (param1=value1¶m2=value2) is the query string.
SEO Challenges Presented by URL Parameters
URL parameters can present challenges for search engine robots when crawling and indexing a website. These challenges can make it harder for websites to rank in search results or can result in the wrong URL ranking.
Duplicate content: URL parameters can create multiple URLs that display the same or very similar content, which may lead search engines to view them as duplicate content. For example, mysite.com/product-list and mysite.com/product-list?sort=price would contain the same content just sorted in a slightly different order. This type of duplication can result in search engines not ranking any pages on the website or ranking the wrong page in search results.
Crawl budget: Search engines can only crawl so many pages on a website at any given time. When there are multiple URLs with similar content due to URL parameters, search engines may waste time crawling irrelevant pages on these instead of discovering new and more valuable content.
Internal link signals: If search engines see multiple versions of the same content with different URL parameters within internal links, it can lead to confusion about which URL should be considered the canonical URL that should rank in search results. For example, even if the canonical URL is mysite.com/product-list, robots might ignore that canonical if all internal links point to mysite.com/product-list?sort=price.
Manage Parameters Effectively to Avoid SEO Problems
In this article, let’s review the best ways of managing parameters on a website, including ways of replacing Google Search Console’s parameter tool. This breaks down into the following steps:
- Step #1: Find Parameters
- Step #2: Extract Parameters from URLs
- Free Tool: Extract Parameters from URLs
- Step #3: Evaluate Parameters to Identify Problems
- Step #4: Control and Manage Parameters
Wondering about Google’s parameter management tool? Google Search Console previously offered a parameter tool that provided an effective way of controlling how Googlebot crawled and indexed website parameters. This tool was removed as of April 2022. Plus, even at its best, the tool only worked for Googlebot and didn’t allow for controlling how parameters worked for other robots crawling through the website or determining how visitors would interact with parameters.
Step #1: Find Every Parameter on Your Website
To manage parameters on your website, you’ll need to find every parameter on your website—or at least most parameters. This includes parameters generated by your content management system, parameters generated by third-party tools used on your website, and tracking parameters used within advertisements and promotions.
Ideally, you could ask the marketers, developers, and partners at third parties for a list of each parameter used. This isn’t a bad step and can provide a starting point. In reality, though, marketers and developers rarely keep a complete record of each parameter used and, in other cases, can’t keep a complete record since many parameters are dynamically generated. There are three places to look for URLs containing parameters: analytics, log files, and site crawl tools.
Web Analytics Tools: Finding Parameters in GA4
Web analytics tools will tell you which pages with parameters have been visited on the website. In GA4, this can be found in the built-in Pages & Screens report. To find this, go to Reports, then expand Engagement, and then click “Pages and screens”.
Once on the report, change the dimension to “Page path + query string and screen class”. Then search for any URL containing a question mark. This will show the URL with parameters. In the upper right corner, click the share icon and then click “Download File”. This file can then be downloaded to Excel or Google Sheets. Once downloaded, you can extract the parameters from the URLs (more details on extraction in Step #2).
Log Files: Find Parameters Google Has Crawled
Similarly, we can find URLs with parameters in log files. Log files track every file requested on a website. This provides a great resource to find information about how robots are crawling through the website. Learn more about using log files for SEO analysis.
Using http Log Viewer as an example, we can query for any requests that include parameters as shown in the example below. We could also filter for only requests that include “googlebot” as a user agent to find only the requests from Google’s crawlers that include parameters. Like with the list found in analytics, this list of requested pages can be exported to Excel or Google Sheets where parameters can be extracted from the URL (more details on extraction in Step #2).
Finding URL Parameters in Crawl Tools
Finally, you can use a crawl tool to locate parameters that are included within internal links on the website. We can see an example of this in Jet Octopus. After running a crawl, view all links found on the crawl. Then, filter for any links that have a link destination containing a question mark. To avoid finding external links that happen to contain parameters, you can also add a condition where link destinations must include your website’s address. Here again, this list can then be exported to Excel or Google Sheets where parameters can be extracted from the URL (more details on extraction in Step #2).
Step #2: Extract Parameters from URLs
Once you have the exported lists, the next step is getting a complete list of parameters contained within those URLs. Begin by combining all of the URLs listed in the exported files together. That way, you have a complete list of all URLs with parameters that are used on your website.
Now, you want to know what those parameters are and what types of values those parameters include. This will be essential to help you know if those parameters are causing problems and how best to address each.
To help you extract parameters, I have a free bulk parameter extraction tool you can use. Using this tool, simply paste in a list of URLs with parameters and then click the “Get Bulk Parameters” button. The page will refresh and you will get a list of all the parameters used along with their respective values.
Other Methods to Extract Parameters
Another tool to extract parameters is SEOToolsForExcel. This is a paid tool but has a function, UrlProperty, that can extract query strings. For example, if the URL is in A1, you could run UrlProperty(A1,”query”) to get the query string. Then, using Excel’s Dump function and the StringSplit function from SEOTools, you can split the query string by the & and then again by the equal sign to get each individual parameter and parameter value.
Step #3: Evaluate Parameters to Identify Problems
Once you have extracted the parameters from the URLs on your website, the next step is determining if those parameters are presenting problems. If those parameters do present problems, you want to determine how to best handle those parameters for SEO. There are a few key questions to address within your evaluation:
- Does the parameter meaningfully change the page’s content?
- Should URLs with this parameter rank in search results?
- Should Googlebot crawl URLs with this parameter?
Does the parameter meaningfully change the page’s content?
Many parameters can change the page’s content, but the question is if that change is meaningful to visitors. The more meaningful the change to the content, the more that parameter is something Googlebot and visitors need to see on the website.
Take as an example a parameter that sorts the page differently, such as ?sort=alpha or ?sort=low-price which sorts a list of products alphabetically or by lowest to highest price. That sort of functionality is helpful for people who are already visiting the page. However, resorting the list doesn’t meaningfully alter the page—it is the same list of products. Somebody searching on Google may want to find the list of products but won’t be so interested in finding that list sorted one way or another. In this example, the parameter is likely not meaningfully changing the content and, therefore, isn’t a parameter Google needs to find or rank.
As another example, there might also be a parameter that filters that list of products. By adding ?filter=blue to the URL, the list of products will only show blue products instead of showing all products. This is a more meaningful change to the page’s content, both for visitors who are on the page and for people searching on Google. People searching on Google would likely be interested in seeing blue products instead of all products. As a result, this parameter is a more meaningful change to the content and would be a parameter Google should find and rank.
With other parameters, though, no changes are made to the website’s content at all. Tracking parameters, like UTM parameters, can be added to URLs that identify a particular traffic source and pass that information along to an analytics program. Because these parameters do not alter the website’s content, they are likely not something Google needs to find or rank from the website.
Should URLs with this parameter rank in search results?
There are some parameters that should absolutely not rank in search results. Tracking parameters are a prime example. Let’s say a page containing a tracking parameter associated with an ad campaign ranks in organic search results. If somebody clicks on that ranking, your analytics tool will report the incorrect source as the ad campaign instead of as an organic search.
If the parameter alters the page’s content in a meaningful way, it is usually best to let Google rank that page. If the parameter changes the content in a meaningful way, then the page with the parameter included would typically make for a good entry point to your website from organic search results.
For parameters that don’t meaningfully change the content, you would want to prevent the page from ranking in search results because these types of pages could be considered duplicates of valid pages you do want ranking. Sorting parameters are an example. How the page is sorted doesn’t meaningfully change the page’s content. Allowing Google to rank the page sorted in any direction—ranking the default sort, the alphabetical sort, and the price sort— would create duplication problems. All of those URL parameters could end up ranking in search results, competing against each other for the same rankings. That could drive down your organic search traffic, benefiting your competitors. In these cases, you need to control which parameter is ranked.
Should Googlebot crawl URLs with this parameter?
If the parameter meaningfully changes a page’s content, Google needs to be able to crawl that page on the website. However, for parameters that do not meaningfully change the content or parameters that do not change the content at all, the next question to consider is if Google should crawl URLs that contain the parameter.
In most cases, it is better to let Googlebot crawl the page instead of blocking crawling. This gives Google more insight into the website and gives them more ability to find internal links to other pages. By crawling a list with a sorting parameter, Googlebot might be able to find internal links to the different items in that list. This could improve the performance of the pages for those individual items.
However, by crawling some parameters, Googlebot might create unnecessary problems for the website. Think back to the tracking parameters as an example—Googlebot crawling these parameters could log hits of those parameters in your analytics tool, inflating data and skewing reports. In other cases, especially on large websites, Googlebot might crawl parameters needlessly wasting crawl time and, potentially, shifting focus away from the content you’d rather Google crawl. In those cases, it makes sense to block Googlebot (and other robots) from crawling the page.
Step #4: Control and Manage Parameters
After evaluating your website’s parameters and considering each of the questions presented in Step #3, there are three different methods for controlling and managing parameters.
- Allow Crawling & Ranking. If the parameter meaningfully alters the content, Googlebot should crawl, index, and rank the page’s URL with the included parameters.
- Prevent Ranking. If the parameter does not alter the content, Googlebot should not rank the page’s URL with the parameter in search results.
- Block Crawling. If there is a problem with Googlebot crawling URLs containing a parameter, Googlebot should be blocked from crawling any URLs containing that parameter.
Allow Crawling & Ranking
Pages with parameters that you want Google to crawl and rank in search results should be treated like any other page on the website you want to be crawled or ranked. You want to do everything you can to encourage rankings by making this page as meaningful and distinct as possible. That includes changing the title, h1, or other text on the page when that parameter is applied to optimize the URL containing that parameter and increase the chances of that page ranking. As well, the page with the parameter should contain a self-referencing canonical.
As an example, consider an e-commerce website that has the main product list available at the URL mysite.com/product-list and a filtered product list available at a URL with parameters, mysite.com/product-list?size=large. Both pages could contain meaningful information and both pages could rank for distinct search terms. However, if the title, h1, and other text on the filtered product list page are all identical to the title, h1, and text on the main product list page, then these pages will compete for the same rankings. Along with differentiating the title, h1, and other text on the filtered page, the filtered page should contain a self-referencing canonical tag (meaning, the canonical should state mysite.com/product-list?size=large as the canonical version of the URL).
As another way to optimize the page, you want to make sure internal links point to the page with the parameters included in the URL and, potentially, list the URL with those parameters in the XML sitemap. For similar reasons, you could also build external backlinks to the URL containing that parameter.
If the parameter does not meaningfully change the page’s content, you want to keep this page from ranking in search results. There are two primary ways of preventing URLs containing specific parameters from ranking in search results: canonical or noindex.
On a page with a parameter, you can define a canonical tag and have the canonical point to a version of the page without that parameter. For example, https://www.site.com/test?sort=alpha might canonical to https://www.site.com/test instead. This would indicate to Google that you consider the page with the parameter to be a duplicate, or a near duplicate, of the page without the parameter.
In most cases, Google will index the canonical version of the URL that does not include the parameter. However, it is important that internal links support the canonical choices as much as possible to reinforce what the canonical tag is communicating to Google. In the previous example, if /test were selected as the canonical, internal links to /test?sort=alpha should be minimized.
As an alternative, a meta robots noindex tag can be used on any page containing a specific parameter. This is a stronger signal than a canonical and instructs Google to keep the page with that parameter appended out of the index and out of rankings. Typically this method should only be considered if Google is not respecting the canonical tags.
The easiest way to prevent Googlebot, or other robots, from crawling a page containing a specific page is to use the disallow command on the robots.txt file. This instructs bots to ignore any disallowed page when crawling the website. Internal links should only minimally use the disallowed parameter paths to further discourage robots from finding and crawling the pages. Here is an example of a disallow statement blocking crawls of the “test” parameter:
The disallow is optional for robots to follow. If the parameter appended should never be crawled by any robots, stronger measures need to be taken. For example, consider a parameter that exposes a development environment. In these cases, you want to carefully control who sees the development environment and who doesn’t. A disallow wouldn’t be a strong enough signal to ensure robots don’t crawl the development environment pages. A better option is to restrict access to only certain IP addresses or require people to login before accessing the page with that parameter. Learn more about handling development environments for SEO.
The problem with preventing crawling, whether with the disallow or with stronger measures, is that this only prevents crawling. It does not have any influence over whether the page with the parameter appears in search results. If robots find the URL with that parameter in some other way, the page may still end up in search results. This is because the noindex directive, which instructs Googlebot to keep the page out of rankings, is included on the page itself; if Googlebot can’t crawl the page they can’t see the noindex. You can end up with pages that are indexed even though Googlebot hasn’t crawled the page. Googlebot will state that “No information is available for this page” in the ranking.
This is part of why it is generally best to allow robots to crawl pages with parameters. If robots can crawl, you can more easily control the indexing of the page. If you must prevent crawling, the other way of keeping the pages out of rankings is with Google Search Console’s removal tool. You can state the parameter as part of the prefix. For example, removing URLs with the prefix https://www.test.com/some-url/?test= would fix the indexed example above. Here is a video discussing the removal request tool in more detail:
Parameters present a number of challenges for SEO. URLs with parameters can lead to duplication which makes it harder for all pages on the website to rank. Parameters can also cause robots to crawl unnecessary pages missing out on the important pages on the website.
It is important to review your website to see what parameters are used and what changes you need to make to handle parameters effectively. The process can be broken down into a few main steps: finding, extracting, evaluating, and controlling parameters. To find parameters, check your analytics, log files, and conduct a site crawl. When evaluating parameters, consider if they meaningfully change the page’s content, if they should rank in search results, and if Googlebot should crawl URLs with the parameter. Finally, you want to carefully determine which parameters robots should be allowed to crawl or rank, and which parameters should be blocked from ranking or crawling. By following these steps, you can effectively manage your website’s parameters to improve SEO performance.
If you have questions about managing parameters on your website, please contact me today. For more help with technical SEO, you can also check out my book, Tech SEO Guide, which provides a reference to managing parameters and many other aspects of the technical side of SEO.