Robots.txt is a special text file which is used to provide instructions to the web crawlers or robots regarding which areas of your site should not be crawled and indexed. This file exists in the root or top level directory and is the first file that the web crawlers access. The crawlers read the information contained in the robots.txt file and proceed accordingly.
Syntax
User-Agent:* (Agent Name)
Disallow: / (File Path)
In the above lines, Agent Name is to be replaced by name of any search engine bots which you wish to exclude and the file path is to be replaced by the absolute url of the file which you wish to exclude.
Example 1
User-Agent:*
Disallow: /
Example 2
User-Agent: Google
Disallow: /
The above contents would disallow Google bot from accessing the server and thereby stopping it from accessing the files contained therein.
User-Agent: Google
Disallow: /
The above contents would disallow Google bot from accessing the server and thereby stopping it from accessing the files contained therein.
When should you use Robots.txt ?
Well, you must use robots.txt if there are some special scripts in your server which you do not want the bots to access or if you want any specific bots not to crawl the contents of the site. For smaller sites of less than hundred pages , there is rarely any need for robots.txt as there are no special scripts to hide. But for larger sites that have huge databases associated with them, they may be some special pages or scripts which needs to be hidden from the bots. In that case, you must use this robots.txt file.
Well, you must use robots.txt if there are some special scripts in your server which you do not want the bots to access or if you want any specific bots not to crawl the contents of the site. For smaller sites of less than hundred pages , there is rarely any need for robots.txt as there are no special scripts to hide. But for larger sites that have huge databases associated with them, they may be some special pages or scripts which needs to be hidden from the bots. In that case, you must use this robots.txt file.