Using a robots.txt File

Head

The robots.txt file sits in your root directory folder and allows you to tell the search engine robots where to go. Creating these files is fairly simple, but if done incorrectly can cause a great many problems with search engine rankings.

The big deal of robots.txt files is that you can tell the search engines not to crawl or index certain areas or pages of your site. Let’s say you have information on your server that you don’t want the search engines to index. In this case you would place the following in your robots.txt file:

User-agent: *
Disallow: /forbidden-directory/
Disallow: /allowed-directory/forbidden-page.htm

The user-agent is the command that lets you reference which search engines spiders you wish to direct your command to. The “*” indicates you are directing to all spiders. If you only wish to exclude certain engines then you’ll need to find out the names of their crawlers.

While most crawlers politely honor the robots.txt file, some do not. It is up to the engine programmers to decide what to do.

Not every site needs a robots.txt file; however it’s common for the search engines to look for this file first before spidering other pages. For this reason, it is better to give the search engines something rather than nothing, even if it’s just a blank file.

This post is part of a continuing series on the topic of:
Optimizing for Maximum Search Engine Performance

Sub-Topic: Search Friendly Elements.

Tagged As: Search & Marketing

[addtoany]

Head

Form

Lower Head

EBLOG

Pole Position Marketing