Duplicate Content & Canonical URLs
What It Is
Duplicate content is a page or substantial section of content that appears in exactly the same way, or an incredibly similar way, on multiple URLs of your website. Duplicate content can happen for a variety of reasons. One of the more common causes of duplicate content is programmatic choices, where the website’s underlying platform allows the same page to be returned at multiple URLs. Another cause of duplicate content includes poor or messy information architectures, where the same content is duplicated in multiple places on the website. In other cases, duplicate content can happen due to mismanagement of the website (i.e. two different people unknowingly created the same page). The first step to fixing duplicate content is understanding the cause; a technical problem will be resolved differently than an information architecture or management problem.
Duplicate Content Example
Let’s take an ecommerce site example. On ecommerce sites, it is common to have filtering and shorting. Let’s say these three URLs exist and list the same products, albeit in a slightly different order.
These three pages are duplicates. While they may not be exactly the same, they do serve a very similar purpose to each other. The second and third example URLs contain a sort parameter (“?sort=color” or “?sort=price”), which creates only a slight difference between these pages (in the way the products listed are sorted). But these pages would still have the same products, the same images, the same text, and, likely, the same title and description tags.
With that much similarity, these three URLs would be considered duplicate versions of the same page. That duplication may confuse Google as their robots try to decide which pages to show in search results. In many cases, the page Google may choose to show in the search results may not be the same page you would prefer people find. In this example, you may prefer people find the first URL instead of the sorted versions of the page. In some cases, Google may also penalize your website for duplicated content.
Resolving Duplicate Content
There are many ways to resolve duplicate content. One option is to add redirects to consolidate multiple versions of a page into just one page. For example, you could redirect the URL /product-list.html?sort=price to /product-list.html. You can also remove the duplicated pages altogether (see our article about avoiding errors when removing pages). You can also rewrite the duplicated page to make each more distinct and unique.
Those methods assume you do not want to keep the duplicated version of the page. In the example above, however, you would likely want to keep the three pages on your website. Providing a way to see the same products sorted in slightly different ways may actually be helpful to the people visiting your website. If you do wish to keep the duplicated pages, you need to define a canonical URL. A canonical URL is the official or preferred version of a URL.
Defining A Canonical URL
When you have duplicate versions of a page, as in the example of the above, a canonical tag (more officially called a canonical link element) communicates to search engines which URL you prefer using.
In the above example, you might consider the first URL (/product-list.html) to be the official or preferred version. This URL does not have a sort parameter which makes the URL look nicer and this page might list the products in the order you’d prefer most people see. However, if sorting by color is the most popular choice for your visitors, you might prefer the canonical URL be /product-list.html?sort=color instead. Alternatively, you may find that the third URL (sorted by price) gets the most attention from other websites or on social networks and therefore the third version might make more sense as the canonical URL.
Canonical URL Example
After you select the canonical version of the URL, the canonical tag needs to be added to each potentially duplicated page. In the example above, any duplicated URLs would contain a canonical tag referencing the canonical URL you selected.
The canonical URL can be defined two ways. The most common is to use a <link/> element in the <head/> of your web page. Here is an example of the canonical code with the URL in the href attribute.
<link rel="canonical" href="http://www.domain.com/canonical-url" />
Another alternative is to add a Link to your HTTP Headers. This is useful for non-HTML files (but typically required technical support to add to your website).
Link: <http://www.domain.com/canonical-url>; rel="canonical"
Supporting The Canonical Elsewhere
You should not rely on the canonical tag as the only means of communicating your URL preferences to search engines. Links throughout the rest of your website should link to the canonical (or official) version of the page as well. This avoids sending mixed signals to the search engines.
For example, in the above example, if you define http://www.domain.com/product-list.html?sort=color as the canonical URL, but the majority of the links on your website reference http://www.domain.com/product-list.html, this would send conflicting signals about which version of the URL is really the definitive, authoritative, and canonical version. Instead, you would want the majority of links to reference /product-list.html?sort=color.
- Supporting rel=”canonical” HTTP Headers
- Rel Canonical with HTTP Headers
- Introduction to Canonicalization
- Google Guide to Using Canonical URLs
- Google’s Duplicate Content Guide
- Duplicate Content SEO Best Practices Guide
Technical SEO Services
Want help improving your website’s technical SEO factors? Contact me today to discuss how we can help review and improve your current technical structure.