I don’t know about you but I’m glad to be back to work after such a long weekend. I don’t do well over long weekends. Mostly because I’m extremely lazy and I end up sleeping about 12 hours every night. And that’s not including the morning, mid morning, noon, post-noon and early evening naps.
So I’m glad to be back to work. Back to a regular schedule of long work hours, few sleeping hours and a few minutes on the pool table a day. While they are not quite as awesome as my “real” family, if you have to be stuck with any group of people for nine to twelve hours a day, you could do worse than the Pole Position Marketing team. They’re good peeps.
Lucky for you we have another installment of Q&A. For those of you new to this game here’s how it works. You post your questions in the comments below and then in the next installment I’ll answer them. Simple really. But there is one catch… you actually have to post your questions. I’ll make no attempt whatsoever to read your mind. I’m a bit of a clean freak and I know how dirty it can get in there!
So on to today’s question…
Question about 404’s.
A site has been moved, twice really and is now a WordPress blog.
30% of the HTTP status codes are 404’s, about 2,500 hits in November.
Some are for pages that existed on the old site that do not exist on the new blog, in any way. Some are for “old” directory structures that have since changed on the new blog and were never on the original site. We simplified the directory structure. domain.com/blog/resources/page.html is now domain.com/resources/page.html, for example.
The old structure only existed for a few weeks as the site was being built in WP. Others are page coding mistakes that are being fixed. Others are related to feeds, archives and so on. Google web master tools shows that they have indexed all but a handful of the new correct URL’s, a couple of old ones are still in their index. The mast majority of 404’s are bots, robots, search engines and so on. Very few are people. I am using the WP plugin that emails me for every 404.
Should I care about and try to fix every single 404? Will the search engines eventually stop looking for old file/path names? Since they see my current content, does not finding old content effect me in the search engines in any way that I should care about? What would you recommend be done about the 404’s?
Many thanks for the opportunity to ask the questions.
First, you win the award for longest question in history. I feel like I’m running for president and this is the part where I say that if I was president there would never be any 404 errors and all bad links would be healed. Forever. And nothing like this would happen again, just so long as we all hold hands and “care”.
But I digress.
Can you tell me the name of the plugin you’re using? That sounds like something I might want here.
Ok, on to business. The occasional 404 error happens, but if 30% of your requests are coming up with errors I’d say you got a problem. Even if these are not humans and only search spiders, to continue to serve links that are broken will end up effecting your performance with the search engines. Too many broken links and the search engines won’t spider or re-spider as many pages as they might otherwise. You’ll also likely take a hit with trust and quality scores as well which will effect search engine rankings.
So yes, you should care and try to fix 404s within reason. I say “within reason” because over time blogs do tend to accumulate broken links. Old blog posts link out to pages that have moved or been removed and I believe search engines take this into account. But again, we are concerned here with the amount of broken links being found which is quite high. You should try to get that down to at least under 3%. But if it’s just as easy to fix them all the I would do that, especially considering you’ve changed your URL structure recently. And even more especially if these are internal rather than external links.
Will the search engines eventually stop looking for old file names? Yes and no. If the search engine visits a page enough times and finds that it’s not there then theoretically yes, it will stop trying to access that page. But by relying on the engine to stop looking on it’s own you’ll be effecting how the search engine spiders your site. You’re forcing it to make decisions about which links to follow and which pages have links worth following. Putting the search engine in this position leaves it prone to mistakes and not following links you do want them to follow.
The other issue here is if those broken pages are linked from external sites. If someone is linking to a page that’s not there then there is no link value being passed. That might be fine but depending on what site the link is on this may cause the engine to continue to try to access this page. I’m just speaking theoretically here, but still, I think it makes the case for fixing any and all links possible.
The easiest solution is to implement 301 redirects for every broken page or image that is being accessed. You don’t have to worry about uploading a file, just implement a permanent redirect for the broken URL passing the visitor and spider through to another URL. Not only does this tell the search engines that this page is no longer here, but it passes on any link value that this page is getting. That and it’s pretty seamless for the visitor and the spider.
A long question deserves a longer answer! I hope this helps and feel free to ask any followup questions. I’ll try to answer those in the comment thread below.