Head

Form

Lower Head

EBLOG

E-Marketing Performance Blog

Updating Your Robots.txt File

Dan Thies has found a neat hidden protocol that can be used on your robots.txt file: The Wildcard.

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.

This will come in handy for sites that use user or session IDs. We recently optimized such a client and had them change the ID requirements from being attached immediately, to only being attached to the URL once a product is added to the shopping cart. A big plus. Unfortunately, once they add a product and surf back into the site the ID now gets attached to every page. This can cause duplicate content problems.

This is the code we’ll be using for our client:

User-agent: Googlebot
Disallow: /*ps_session=

This will prevent any URLs with the session from being spidered and therefore prevent pages with duplicate content from getting in Google’s index.

Stoney G deGeyter

Stoney deGeyter is the author of The Best Damn Web Marketing Checklist, Period!. He is the founder and CEO of Pole Position Marketing, a web presence optimization firm whose pit crew has been velocitizing websites since 1998. In his free time Stoney gets involved in community services and ministries with his “bride enjoy” and his children. Read Stoney’s full bio.

Comments are closed.