Avoid SEO Problems with Dev & Staging Sites
August 13, 2022
When developing a new website or building enhancements for an existing website, it is critical you have a place to test those changes before releasing those changes to the public-facing live, or production, website. This is often referred to as a dev site (development site) or a staging site. Some companies may have a very complex testing site configuration with multiple testing sites established. On the simpler side, other companies might not have a separate testing site and instead set up a testing area within their production website. For simplicity, in this article, I’ll lump all of these areas under the phrase “dev site” because the same considerations, complications, and solutions largely apply to each type of configuration.
SEO Problems with Dev Sites
Dev Site Duplication
The main problem, from an SEO perspective, with a dev site is duplication. By design, the dev site will contain some or all of the exact same content contained on the production website. This is because duplicating most of the content on the dev site allows for more effective testing of any proposed changes. However, if Google’s bots are able to find and crawl the dev site, they’ll see the dev site as duplicating the content contained on the production website. If Google detects a duplicate website (dev site or otherwise) your traffic can plummet.
There are a few different ways Google responds when bots have detected duplicate content:
- One possible response is that they will simply ignore the duplicate content. Google has seen the dev site and knows it duplicates the production website but hasn’t deemed it enough of a problem to do anything about it. This is the best-case scenario but not the most common response when Google detects a dev site that duplicates the large majority of a website.
- A more common response is that Google starts ranking the dev site in search results often in place of the production website. The upside is that you haven’t completely lost traffic or rankings. Instead, the traffic and rankings are going to the dev site not the production website (though, you probably don’t want traffic arriving on the dev site).
- Less common, but occasionally in response to detecting duplication on dev sites, Googlebot can stop ranking the production website without ranking the dev site. In this case, all traffic and rankings are simply lost. This can be seen in the graph below where Google detected a dev site that happened to contain a lot of content it determined was spammy along with duplicate content. The spammy content was simply placeholder content used in the staging environment. It was months before the problem was fixed and during that time, this site lost significant amounts of traffic.
Additional SEO Problems from Dev Sites
Along with losing traffic or rankings due to duplication, dev sites can create a few other problems that directly or indirectly can affect your website’s SEO performance:
- Overloading crawl budget. Typically, the volume of hits from Googlebot (crawl budget) won’t present a concern on a production website because production websites have reliable and robust hosting that can handle more traffic. However, dev sites typically don’t have very robust hosting and Googlebot crawling a dev site can cause that dev site to crash. Googlebot will then see server errors which can cause a variety of indirect problems for how Googlebot sees your website. Not to mention, your dev site might be offline at the time you need it for development work.
- Exposing private information. Dev sites often contain information you are not yet ready to share publicly, like new products or services you are about to release. Dev sites also are a work in progress so new features or tools will likely be buggy on a dev site while they are being developed. In other cases, dev sites contain placeholder information because accurate information is not yet ready. You do not want Googlebot—or customers—seeing information about products that haven’t been released, placeholder information, buggy in-progress features, or other incomplete content contained on dev sites.
- Generating backlinks. Along with Googlebot, numerous other bots are crawling the web looking for websites. In some cases, those other bots may come across the dev site and then list the dev site in some type of directory. If other bots do this to a dev site, that can cause Googlebot to find the dev site causing the problems with duplication discussed above.
Solving Dev Site Problems
Ideally, you can configure a dev site correctly from the start to avoid SEO problems. However, reality isn’t always ideal and you’ll sometimes end up with dev sites being found by Google and creating problems. Before we discuss the best way to configure a dev site, let’s review how you address a few common scenarios where Google has indexed, crawled, or begun ranking any pages from the dev site.
Scenario #1: Dev Site Indexed, Ranking, and Getting Traffic
This is the worst-case scenario. Not only has Google found the dev site and crawled the dev site, Google is also listing the dev site in rankings and presumably good rankings at that if there is traffic coming to the dev site. How big a problem this is depends on how much traffic is coming to the dev site, which terms the dev site ranks for, and if there has been any traffic lost on the production website.
In more severe circumstances, where there is significant traffic coming to the dev site or if the dev site has captured key rankings, the solution typically involves redirecting the dev site to the production website. A redirect will prevent Google from seeing the dev site while also providing the clearest signal to Google that the duplicate content has been resolved. This should lead Google to stop ranking the dev site and rank the production website that has been redirected to instead. For example, if the dev site is now ranking for branded terms or terms related to key products or services, then a redirect should help transfer those rankings back to the production website instead.
By redirecting your dev site to the production website, you have essentially shut down your dev site. As a result, the second step will be creating a new dev site elsewhere. See below for steps on how to create a new dev site.
Along with shutting down and redirecting the current dev site, you also need to determine how Google found the dev site. Is there a link on your website that led bots (and, possibly, people) to the dev site or is there an external link? For any links within your control, update any links referencing the dev site to reference the production website instead.
If the traffic coming to the dev site isn’t very important, then see the recommendations below and treat the problem as if the dev site is indexed but not ranking.
Scenario #2: Dev Site Indexed but Not Ranking or Getting Traffic
If Google has already begun indexing the dev site, it is likely only a matter of time until pages from the dev site begin ranking in search results. So, you need to move quickly to start shutting down the dev site to make sure it doesn’t end up ranking, creating a bigger problem. This is also the case if the dev site isn’t getting significant amounts of traffic or capturing important rankings; if the dev is getting some traffic or earning some rankings now, it may start to get more traffic if immediate action isn’t taken.
The best first step is to restrict access to the current dev site – see the section below for details. It is also helpful to submit a removal request to prevent the dev site from ranking in search results (or remove it from rankings that currently exist). Finally, as in the previous scenario, you want to determine how Google’s bots found the dev site and remove any links that reference the dev site.
Scenario #3: Dev Site Crawled/Discovered but Not Indexed or Ranking
If Google has found your dev site but hasn’t begun indexing or ranking pages from the dev site, you are in a less critical situation. It is still important to make changes to prevent future problems, but you have a bit more time to work to resolve the problem.
The best course of action is to restrict access to the dev site to prevent Google from crawling the dev site any more than they already have. See below for instructions on how to restrict access. Typically, you do not need to add removal requests or redirect anything in this particular scenario. However, it is important to find the links that led Googlebot to the dev site and remove those.
Preventing SEO Issues When Establishing a New Dev Site
To avoid SEO problems, the primary goal is to restrict access to the dev site. Along with restricting Googlebot’s access, you also want to restrict access to the general public and other bots too. If Google can’t see the dev site, then you should be able to avoid the SEO problems related to dev sites and avoid disrupting SEO performance of your production website. Let’s discuss a few methods you should and shouldn’t use to restrict access.
Method #1: Disallow in Robots.txt (Not Recommended)
The disallow statement in the robots.txt file suggests to Googlebot that they should not crawl the indicated pages or website. That can help ease crawl budget concerns and prevent Googlebot from seeing the duplication contained on the dev site.
However, the dev site can still rank in search results even if Googlebot is prevented from crawling, causing issues for the production website. As well, the disallow statement doesn’t prevent the general public or bots other than Googlebot from accessing the dev site.
Given that, the disallow on the robots.txt file is typically not the best approach to restrict access to the dev site.
Method #2: Noindex in Meta Robots (Not Recommended)
Another option some take to restrict access to the dev site is to add a meta robots noindex tag to all the pages on the dev site. Using a meta robots noindex will typically prevent Googlebot from indexing and ranking the pages contained on the dev site.
However, Googlebot can still crawl the dev site and may still negatively evaluate the production website if any duplicate content is found. As well, the noindex doesn’t prevent other bots or the general public from accessing or linking to the dev site.
As a result, the meta robots noindex is not an appropriate solution for the dev site.
Method #3: Password Protect Dev Site (Recommended)
The best solution is to only allow authenticated users to access the dev site by requiring a password. On Apache, a password restriction can be added in the .htaccess file and there are similar methods to add password protection on Windows. A password restriction prevents Googlebot, other bots, and the general public, from accessing the dev site so that only authenticated users can visit the dev site; Googlebot won’t be able to crawl a page they can’t access. Plus, Googlebot won’t index or rank a page that requires authentication, which prevents the issues related to duplication discussed above.
The downside is that a password restriction can sometimes make it harder to test the dev site. Third-party scripts may not execute properly behind a password restriction. Testing tools might not be able to get through the password requirement. API calls may fail because of the restrictions and more. While this is the best method to avoid SEO problems, there are other methods that can work better for development and for SEO.
Method #4: IP Block (Recommended, Alternative)
Instead of authenticating based on a password, you can authenticate based on an IP address. This method has all the same upsides as password protection but can be less disruptive during development. On Apache, in the .htaccess file, you can set “deny from all” to block everybody from accessing the dev site. Then, you can allow specific IP addresses for staff, developers, designers, third parties, testing tools, etc. that do need to access the dev site. There are similar methods for Windows servers.
Don’t Forget: Remove Links to the Dev Site
This sounds obvious but you want to make sure no links to the dev site are exposed publicly. It is surprisingly easy for links to the dev site to sneak out onto the production website or external websites. A mistake in the code might cause the XML sitemap to use the dev site URL. Or, internal links added during development might reference dev links. Images might be hosted on a dev URL instead of the production website.
After releasing any major changes from the dev site, crawl the production website to identify any links using the dev site’s URL. Change those links immediately to prevent Googlebot, or others, from accessing your dev site. Also, be sure to monitor external backlinks to the dev site domain and, if possible, ask the website responsible for those links to update any links referencing the dev site. If there are too many websites linking to the dev site, you may need to shut down your existing dev site and move it elsewhere that cannot be found.
Dev sites can create a lot of problems for your website’s SEO performance. However, if Googlebot ends up crawling, indexing, or ranking the dev site, it is also a problem that is generally easy enough to resolve. The main step is to restrict access to the dev site—password or IP-based authentication methods are the best way to keep bots and people off the dev site. Remember to avoid exposing any links to the dev site to prevent people or bots from even knowing where the dev site exists.
If you are currently facing problems related to configuring a dev site or have any questions about preventing SEO problems related to dev sites, please contact me.