Robots.txt is a file that is placed on the root of your server to inform the search engines which pages or sections of your site to crawl or index.
It can be used to:
The file itself is a simple text file which can be created in notepad or many other simple text editors.
Most search engines look for this robots.txt file as soon as their spiders arrive on your site. This file acts as an invitation to spiders providing them with access to given areas of the site. As most search engines look for this file it is a good idea to create one even if you do not need to exclude any directories or files from the spiders. Simply create a robots.txt file to allow the spiders to crawl the whole site.
Robots.txt files are easily created using notepad or any other basic text editor, each entry within the file has just 2 lines which are:
User-Agent: [Spider name]
Disallow: [Directory/File Name]
For example the statement below uses a wildcard (*) to exclude all spiders from a given directory. Simple replace the folder name, with the folder you want excluded from the spiders. This method could be used to exclude a directory that you are currently working on.
User-Agent: *
Disallow: /folder name/
To allow the spiders to index everything simply used the wildcard sign (*) and leave disallow empty.
User-agent: *
Disallow:
Similarly to disallow everything place a slash (/) within the disallow field, be very careful though as this is a small change and would stop ALL spiders crawling your site.
User-agent: *
Disallow: /
Paul Spreadbury
SEO Programmer