This series is pulled from a presentation given at SMX East. Part I of this serious covered the problems duplicate content creates. This post covers the causes of duplicate content, and Part III will look at the solutions you need to implement to fix your duplicate content problems.
Quick recap of Part I: Duplicate content causes problems. Duh!
Now that we got that out of the way, let’s take a look at the causes of duplicate content and bad URLs so we can then learn how to fix this mess and get on with better search engine rankings!
There is No Single Cause of Duplicate Content. Don’t collect them all!
I started off this series discussing the problems duplicate content causes. I did that deliberately because, if it’s not understood that there is a problem, the causes and solutions really won’t be of great importance to the reader. But, now that we know a problem exists, we have to identify the cause so we can then fix them.
If you’re gonna kill off your duplicate content, you first have to know what causes it. Any problem, until it’s has been recognized and analyzed, cannot be properly corrected.
The image above shows an example of several URLs that can all lead to the same content. In fact, these would all be considered the home page of the website. While it’s really only one page, there are four different URLs that can be used to access this page.
The search engines can pretty easily figure out that these URLs are really only one page. But still, they tell us how to “fix” the problem by giving us a canonical URL tag that we can implement just in case.
Eventually, the engines do get around to figuring this out on their own (even without the canonical tag) but not always as quickly as we would like, and not before we already start splitting link value on the site. Any site looking to get some strong improvements quickly shouldn’t wait around for the search engines to get around to figuring the site out. Be bold! Be proactive! Recognize the problem, and fix it!
Poor product categorization
Product categorization can create a lot of duplicate content problems if not implemented correctly. With a lot of systems, every category a single product fits into creates a separate URL that each particular product can be accessed. If your product fits neatly into three categories, you now have three duplicate pages. If ten categories, 10 duplicate pages. You can see the problem here.
While I’ll save the solution for the next part in the series, product categorization can be a bit tricky. We want our products to be accessible. This isn’t shopping in a store where the product can only be in one place. The beauty of it being online is that one product can be found in multiple “isles” at the same time. The tricky part is, by doing this improperly, you may be making your products easier to find by your visitors on your site, but more difficult to find for the search engines.
Secure page issues
Most site’s don’t need to secure any of their pages, until the visitor moves into the shopping cart area. Once there, the visitors can feel safe knowing their personal information isn’t going to be accessible to prying eyes. But, once in the secure area of the site, there will often be links back out to the main site. Sometimes, these links maintain the secure “https” in the URL.
You wouldn’t think this would be a problem, right? Who cares if any of the pages, they continue to visit stay “secure” or not? The problem is that security doesn’t mean locked down. It just means your information is protected. But, regular site pages generally don’t need to be protected. Once a visitor accesses a secure page, that URL now has opportunity to find it’s way into the search engine index.
Can we say: duplicate content problem? (I knew you could.)
This opens up what can be a Pandora’s Box of secure, yet duplicate, content that makes it’s way into the engine database and begins to steal value from your non-secure pages. Not good.
There are good links and bad links. And I’m not talking about the type of site’s being linked to. I’m talking about the code used to link to pages.
I’d tell you all about good link code, but that would spoil Part III. Gotta keep the suspense somehow! But, for now, the image above shows two kinds of bad link code you need to be aware of.
Session IDs are a mess. To give you the gist, every user is assigned an ID number that is appended to the URL of whatever page they are visiting. Yeah, you heard that right. Every user. That means that a unique URL is created for each page for every visitor that lands on your site. Got 10K visitors this month that only visited one page? You now have 10K URLs out there that could be indexed by the search engines–all duplicate.
Told you. Mess.
Session ID’s Create Duplicate Page Farms
Just to give you a bit of a visual on what session IDs do, the image above depicts a single page linking to other pages on your site. It’s all well and good with the first session, because you only have a single URL for each of those pages. But, when you get into sessions 2 and 3 and 10,000, all the same pages are now duplicate pages. A 10 page site now has 100,000 URLs that the search engines are indexing, but most of which carry the same content over and over again.
Any time a URL is forced to change based on the visitor, you’re going to have a problem. Not only do you get some duplicate content problems, but you are essentially a duplicate content farm, pumping out more and more duplicate content with every visitor that comes to your site. Each visitor essentially devalues your site in the eyes of the search engine.