Many web designers turn a deaf ear to the concept of robots.txt by saying that it is mearly a job of Search Engine Optimizer. But one who want his website to boom in case of design and SEO will try to learn such features. Well, that might be a part of interest. Lets try to innovate the concept of robots.txt. We all want our site to list in Google search at the topmost place. More accurately at the top of our compitator's site. We do submit our site in the directories, we write daily a blog and all that stuff....But sometimes we do not want Google to index our specific pages. The case arises specially when the pages are under construction. In seach cases we include robots.txt file. This inclusion is called The Robots Inclusion Protocol. So, How can I then create a robots.txt for my site and where should I put it would be the obvious question from your mind. Lets explain this.A robots.txt file can be created by using a simple text editor like notepad. If you are aware of Google Webmaster tool then you can easily create robots.txt file by using robots.txt generator tool available there. The filename should be in lowercase. The robots.txt file should be placed in the root of the domain. Don't create a directory naming robots and inside the file robots.txt. The proper path for the robots.txt would be http://www.yousite.com/robots.txt. If you want to see any site's robots.txt file you can easily see it by entering the above url replacing the sitename. The simplest robots.txt file syntax would include two lines: Will block the entire site from all the robots.
This was all about the basics of Robots Inclusion Protocol. Thanks!
User-agent :
Disallow:
The user-agent is the robots(the programs which browses the web automatically) of the search engine. You can set the rule for the specific robot by listing its name or you can include all the robots by putting asterisk in User-agent line. In the disallow line you include the pages that you want to block. In Disallow line we list all the pages that we want to hide from robot. The list should begin with forward slash.
For example:
User-agent: *
Disallow: /
To block a directory we use: Disallow: /directory-name To block a specific page we use: Disallow: /page.html |

This comment has been removed by the author.
ReplyDeleteAs a webmaster, you definitely should use user-agent headers to manager server traffic. But understand that this is purely a pragmatic tactic and not a serious security measure.
ReplyDeleteI wrote more about this here:
Webmaster Tips: Blocking Selected User-Agents
http://faseidl.com/public/item/213126