Yeah, yeah, the search engines are getting smarter about duplicate content… blah, blah, blah. It’s no longer the problem it once was… yada, yada, yada. Google will get it all sorted out for you.
I don’t care how smart the search engines are, it’s no excuse for laziness. Sure, a maid may clean up your living room for you, but that’s no excuse to ask them to wipe your…, er, mouth, too.
The intelligence of the search engines is your fall back. Your back up girl. The friend you call when all your other friends are out of town or on dates.
In marriage terms, you may have a brilliantly smart spouse, but that doesn’t give you an excuse to be a dumb-ass. In fact, brushing up on your IQ points might actually score you some points where it counts. Search engines are no different.
If the collective intelligence of the search engines failed overnight, where would you be? Because that’s really the point, isn’t it? You can’t get by on someone else’s ability to make you look smart. Sooner or later someone smarter than you will come along and, well, there will be no saving grace for you. You’ll be shamed and put up for display for what you really are. It’ll be like showing up to the first day of school, naked. (Or am I the only one that has that recurring dream?)
Ooooookay, let’s move forward, shall we?
So, despite the all the intelligence Google can muster, it’s still a good idea to fix your duplicate content problems. It will play a role in which pages the search engines spider and index, as well as how many they index. These are both critical to getting visitors to your site and presenting them to the best pages for their query.
Here are six easy ways to eliminate your pesky duplicate content problems:
Remove old files from your server
This isn’t something that most people think about as a duplicate content solution, but its a pretty big one. Over the years a typical site goes through designs, re-designs, development, and re-development. Developers will often work with their new files on their own server then upload them to the client’s server. Old files either get overwritten, or if the new developer changed file names or moved files around, the new files just get added to the pile while the old site files stay in place.
Now, if the new site is perfect, with no links pointing to the old file names, eventually the search engines will figure out which pages are the new, relevant pages for the site. But, as long as those old pages are in place, there is potential for duplicate content problems
If those old pages are already in the search engine index, the engines continue to spider those pages and possibly keep them in the index. If a new version of that page was created with a different file name, you now have duplicate content.
Why would the engines do this? Perhaps there is a stray link on the site that points to one or more of these old pages. Maybe there are external links pointing to these pages that keep the engines returning.
By deleting all old files, you’ll be able to find lingering internal links and implement redirects to get visitors to the proper pages.
Fix broken links
Broken link checks are great for finding broken links, but unless your duplicate pages have been removed, such a check won’t pick up on the duplicate content problem. Only after you remove those old files will you be able to identify and fix site links that go to these dead pages.
I recently worked on a client site that, after the re-development, they had hundreds of broken links. With dozens of old files on the server, we had to first figure out which files were the correct files and then remove the incorrect dupes. Each round of broken link checks gave us more links to fix and more files to remove.
I literally spent 5 hours fixing broken links on a site because of old files mixed with new files, and links pointing to old files that were still on the server. I have no doubt they’ll see a lift in traffic and conversions from these fixes.
Link to www.version of your site.com
When working on sites, I often see a mixture of links pointing to www.site.com/page and site.com/page. While this gets the visitor to the same page with the same content, it creates a unique URL that the search engines can index.
Eventually the engines get this snafu figured out, but why wait? Get it fixed now so it will never be an issue. Go through all your site links and direct them to the www. version of your site.
While I suggest this fix strongly, I also recommend preventing the non www. URLs from being displayed. You can do this with your .htaccess file if your server supports them. You can also use Google Webmaster Tools to set your preferences to always use the www. version.
If I were me–and I am–I’d do all three of these options, just to stay on the safe side.
Use absolute links
Absolute links are links that contain the full URL of the page being linked to. A relative link only works on internal site links and uses the least amount of information needed to get the visitor to the destination.
Relative links work just fine for navigating the visitors to the correct pages. But, if you don’t have the www. redirect issues above implemented, relative links can sometimes pose duplicate content problems.
If someone comes to your site using the non-www. version (site.com) all relative links will, by nature, not include the www. This opens the door for the search engines to spider all your duplicate non-www. URLs.
Using relative links that include the www. in the URL prevents this from happening. It doesn’t matter what URL the visitor used to get to your site, all remaining URLs will point to the correct www. version.
Use canonical tags
E-commerce sites have a special set of problems. You want to make your products available to your visitors through multiple navigation paths, but that often creates duplicate product pages based on the trail used to reach it. In this situation, my first solution is to create a master URL for each product and ensure that, regardless of the navigation path, that URL is the one displayed when the visitor reaches the page.
But, short of that, you have the option of using the canonical tag:
< link rel="canonical" href="http://www.example.com/proper-page.html" />
That little bad boy tells the search engines which page should be considered the “correct” versions. So if you have multiple, duplicate product pages, you can add this canonical tag pointing to the proper page and the search engines will, in theory, not index the duplicate pages.
Never link to /index.html
This is especially true of your home page, but can apply throughout your site. On your home page you have two options:
www.site.com/index.html (or .asp, .php., etc.)
Make sure all links going to your home page link to www.site.com and not the other.
Same with subfolders.
Both of these URLs will take you to the same page. Pick the one you want to use, and stick with it in all your internal site links.
Implementing ALL of these fixes may seem like duplicate content fix overkill, but most of them are so easy there is no reason not to. It takes a bit of time, but the certainty of eliminating all duplicate content problems is well worth it.
Google’s pretty smart, but you’re smarter. You know it makes mistakes, but let those be mistakes in analyzing someone else’s site, not yours.