Robots.txt Plugins And Why You Should Have One
I’ve been using kbrobots for the past year and there are probably fancier plugins out there but this one works fine for me. For those of you who don’t know, the robots.txt file sits on your server root and tells the different spiders, robots, and other agents what they should and shouldn’t crawl when they come to your site. This is important for a number of reasons and we’ll get to that below.
First of all you will need a plugin unless you’re comfortable going in to the server root and creating a robots.txt file. It’s really not that hard but the plugins make it so easy and there are many good ones out there, it only takes a couple of clicks and you’re set up. Another thing to think about is that the file needs to be in the root of your server files for your domain in order to work. If you regularly install WordPress in a directory, you will need to manually create a robots.txt file in the root. Here’s what I mean. I have a number of blogs that utilize an email capture on the root home page so if you go to www.mydomain.com, you will see the email capture and following successful submission, you will go to the home page of the blog (usually a presell page of some kind). The blog is located at www.mydomain.com/articles. Using a plugin when setting up this type of blog isn’t going to work since it’s in a different directory (/articles). You will have to setup the robots.txt file in a specific way if you have WordPress installed in another directory.
The primary reason for having a robots.txt file is to avoid duplicate content being crawled. There is a lot of debate about what the term “duplicate content” really means and that has spawned a lot of poorly spun content on the web that is difficult to read and makes no sense. There is this fear of being penalized by Google for duplicate content and while no one knows for sure what Google is or isn’t doing, I personally think this “fear of Google” mindset has gotten out of hand. After all, how does it differ from syndicated content from the Associated Press or the Washington Post or the New York Times? There is duplicate content all over the place.
The duplicate content I’m worried about when setting up a robots.txt file is content that is duplicated right on my own site. Let me explain. WordPress, for all the amazing and wonderful things it can do, generates a ton of duplicate content. Think about how your posts and pages can be found. They can be found in tag archives, category archives, author archives, date archives, and the list goes on and on. Robots.txt is setup to point the spiders towards the content you want indexed and away from any duplicate content. This is the duplicate content that Google will penalize you for, at least in my opinion.
Robots.txt can also keep the spiders from indexing any cloaked links, admin pages, login pages, catalog pages, etc. that you don’t want found. By the way, catalog pages can be a big source of off site duplicate content as well if you are utilizing ebay, amazon, or other retailers descriptions and sku numbers from your site. Just think how many other sites besides yours might be doing the same (not to mention the actual source like ebay, amazon, etc.).
I’ll leave a link to the code I paste into my own robots.txt file and you’re more than welcome to copy it. There are two versions of the code. The first is used when WordPress is installed on the root of the domain ie www.mydomain.com. The second is used when WordPress is installed in a directory of the domain ie www.mydomain.com/articles. Please note that unless you are using the install directory named articles, you will have to change the directory to match your own.
Here’s the link-Robots.txt Example






I’ve never used any plugins for making or creating robots.txt file, I make this personally with my note pad but from now I’ll kbrobots as you do…
Nice post! I am surprised about the quality of articles you’ve put on your website, that you love to create really comes through. Grabbed the Rss, many thanks! I am a cool robots fan!Welcome to my blog: http://bit.ly/fXy08A