Categories
Uncategorized

Create a robots.txt file for your Site or Blog

Creating a robots.txt file for your site is very easy, even for beginners. You don’t have to know anything about Search Engine Optimization(S.E.O) to do this. Setting your own robots.txt file lets you communicate with the search engine bots on the Internet that crawl your site for fresh content regularly. This is how my robots.txt file looks like:

#Created by www.CollegeStash.com do not remove this
sitemap: http://www.collegestash.com/sitemap.xml

User-agent:  *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /recommends/
Disallow: /go/
Disallow: /category/
Disallow: /tag/
Disallow: /tag/*
Disallow: /archives/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
Disallow: *?wptheme
Disallow: ?comments=*
Disallow: /search?
Disallow: /?p=*

Disallow: /wp-content/plugins/

User-agent: Mediapartners-Google*
Allow: /

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /
#Created by www.CollegeStash.com do not remove this

You can copy the text above, paste in your file text file and name it as robots.txt and then upload it to the root directory of your server(The directory where you have index.php or index.html files of your Site). You can also download this file and edit the existing sitemap URL with yours and  upload it directly to your root folder. The robots.txt of anysite is available directly and can be copied but I advice you to do it on your own cause it won’t take much of your time.

Understanding the commands

Allow /
As the name itself suggests, this command allows the bots to crawl the particular sub-directory of your Site if mentioned.

Disallow:
/feed

This command blocks the bots from crawling the “feed” sub-directory of your Site.

Sitemap: http://www.collegestash.com/sitemap.xml
This command allows the search engine bots to crawl your site through your sitemap, given explicitly in the command.

# This is a comment, i.e., the text after this symbol on the same line will not be read by the bot.
This # symbol is used for single line comments, and any text following it is a comment in the robots.txt file.

 What is a robots.txt file

The robots.txt file is used to tell the search engines which part of your site they should be crawling and which part they shouldn’t crawl. For privacy and security related issues people would like to exclude few areas of their website or blog from the search engines because they don’t want their private data to be indexed in a popular search engine like Google, so that everyone can access it. The search engines all over the Internet use bots to crawl the web and index new content onto their databases. This robots.txt file is useful in communicating directly with the bots or crawlers of the search engines.

Advantages of having your own robots.txt file

It is a good technique in S.E.O. to have a robots.txt file for your site as it helps you in blocking the search engine crawlers where you don’t want to have them. To grab the robots.txt file of any site just use this line in your address bar http://www.sitename.com/robots.txt ( For Example: https://www.facebook.com/robots.txt ).

If you have any problems regarding this, then please feel free to mention them in the comments section below.