It is really very great thing for your website or blog when search engine spiders frequently visit your website and index your content in their databases, but sometimes there are few cases when the indexing parts of your website or blog in actually not what you want to index. For example, if you have two pages with same content on your website or your have two versions of your webpage, you would have the one version excluded from crawling or indexing, otherwise you risk being forced a duplicate content penalty. Also, if you do want the users to see, you will also like that search engines crawlers do not index such pages.
The one way to tell search engine crawlers that files and content on your website to avoid is with the use of Robot.txt tag.
Robots.Txt is a text file you insert on your website to tell search engine bots which pages or content you would like them to visit. It is essential to clarify that these files do not prevent search engines from crawling your website. These robots.txt files are inserted in the main directory your website otherwise search engines will not be able to find it as they do not search the entire website for robots.txt file.
There are different formats to use Robots.txt file some of these are as follows:
(a) If you want to allow indexing of everything on your website:
(i) User-agent: *
(ii) User-agent: *
(b) If You want to disallow indexing of everything on your website:
(c) If you want to disallow the indexing of a particular folder or file:
(d) If you want to Disallow Crawlers from indexing of a folder, except for allowing the indexing of one file in that particular folder:
Disallow: /folder name/
Allow: /folder name/filename.html