articles

Home / DeveloperSection / Articles / Role of Robot.Txt

Role of Robot.Txt

Ashish srivastava2592 05-Sep-2015

Robots.txt is a text file webmasters create to instruct robots how to crawl and index pages on their website. 

Robots.txt to be placed in the top-level directory of a web server in order to be useful. Example: http://www.example.com/robots.txt 

Our Robots.txt file is what tells the search engines which pages to access and index on our website on which pages not to. For example, if you specify in your Robots.txt file that you don’t want the search engines to be able to access your thank you page, that page won’t be able to show up in the search results and web users won’t be able to find it. Keeping the search engines from accessing certain pages on your site is essential for both the privacy of our site and for your SEO.  This article will explain why this is and provide you with the knowledge of how to set up a good Robots.txt file. 

Robots.txt files are useful if we want

 

•    If we want search engines to ignore any duplicate pages on your website


•    If we don’t want search engines to index your internal search results pages


•    If we don’t want search engines to index certain areas of your website or a whole website


•    If we don’t want search engines to index certain files on your website (images, PDFs, etc.)


•    If we want to tell search engines where your sitemap is located

 

As mentioned above, the robots.txt file is a simple text file. Open a simple text


editor to create it. The content of a robots.txt file consists of so-called "records".

 

A record contains the information for a special search engine. Each record consists


of two fields: the user agent line and one or more Disallow lines. Here's an example:

 

User-agent: googlebot

Disallow: /cgi-bin/

 

This robots.txt file would allow the "googlebot", which is the search engine spider of Google, to retrieve every page from your site except for files from the "cgi-bin" directory. All files in the "cgi-bin" directory will be ignored by googlebot.The Disallow command works like a wildcard. If you enter User-agent: googlebot

Disallow: /support

Both "/support-desk/index.html" and "/support/index.html" as well as all other files in the "support" directory would not be indexed by search engines. 

It’s important to update our Robots.txt file if we add pages, files or directories to our site that we don’t wish to be indexed by the search engines or accessed by web users. This will ensure the security of our website and the best possible results with our search engine optimization.

 

 


Updated 13-Dec-2017

Leave Comment

Comments

Liked By