Search engine spiders can be very forgiving with a lot of duplicate content issues. I’ve found that, given enough time, the engines learn when two websites or web pages are complete duplicates of the other. Once they figure that out then they basically understand that a link to one is a link to the other, etc. One version will ultimately be dropped from the index in favor of the other.
There are two basic problems with this. First, it all takes time. Until the search engines figure out which dupes should be “merged” you’re essentially splitting link flow. Two inbound links, one to each version, produce only have the power than two links both pointing to a single version.
The second problem is that you leave it to the search engines to decide which pages or site should be dropped from the index. When you let the search engines decide, you lose essential control.
I’ll probably say this in every post I make about duplicate content, so forgive me if you’ve heard it before, but the less you make the search engines think they better. When it comes to duplicate content issues, they want to be told what to think. And you can do that by not presenting two versions of the same page.
One issue we’ve come across, especially with e-commerce sites is when products can be accessed via both secure and non-secure URLs.
This issue is typically caused by poorly implemented site navigation and linking. What happens is that the shopper adds a product to the shopping cart. At that point they enter into the secure pages. But when the shopper continues shopping, instead of proceeding to checkout, they navigate back into the site keeping the https: in the browser URL.
There are a couple fixes to this. The first is to not allow your visitors to enter the secure areas of the site until they are ready to check out.
There is no reason to go secure just by adding products to a cart. The place to go secure is when they hit the checkout button. But–and this is important–if they leave the checkout process to continue shopping, they need to be placed back into non-secure pages.
This leads us to our second fix: Use absolute URLs in all site navigation and shopping cart pages.
Quick refresher: an absolute link uses the full domain name in the link:
A relative link only uses the path from current location to the destination:
When using relative links if the shopper is already on an secure (https:) URL then they’ll stay on secure URLs. When you use absolute links then you are forcing the visitor to go http: instead of https:
When shoppers can access secure and non secure versions of the same unsecure page, then likely the search engines can as well. This creates almost a complete duplicate of your site, one secure and one non-secure version. Using absolute links will ensure that at no point can a regular page be accessed in secure mode, thus preventing the duplication.
Related posts on duplicate content: