robots.txt is used by search engine webcrawlers to determine what to index. robots.txt should be located in your document root. If you don't have one, you can create a blank one. This brief guide will help you with managing webcrawlers through your robots.txt file.
- Stopping bots entirely. If you wish to stop webcrawlers from indexing your site (this will also stop webcrawlers from ranking your site), you will need to add this code to your robots.txt:
#Code to not allow any search engines!
User-agent: *
Disallow: / - Stop bots from crawling parts of your site. You can allow some pages to be indexed, while preventing others from being indexed. You will need to add this code to your robots.txt. Replace "/cgi-bin/", "/tmp/", and "/junk/" with whatever pages you wish to be blocked.
# Blocks robots from specific folders / directories
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/ - Google & Bing. Neither of these webcrawlers obey robots.txt. If you wish to control their webcrawler activity, you will need to make Google and Bing Webmaster accounts with their services.