Head

Form

Lower Head

EBLOG

E-Marketing Performance Blog

Updating Your Robots.txt File

Dan Thies has found a neat hidden protocol that can be used on your robots.txt file: The Wildcard.

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.

This will come in handy for sites that use user or session IDs. We recently optimized such a client and had them change the ID requirements from being attached immediately, to only being attached to the URL once a product is added to the shopping cart. A big plus. Unfortunately, once they add a product and surf back into the site the ID now gets attached to every page. This can cause duplicate content problems.

This is the code we’ll be using for our client:

User-agent: Googlebot
Disallow: /*ps_session=

This will prevent any URLs with the session from being spidered and therefore prevent pages with duplicate content from getting in Google’s index.

Comments are closed.