Dealing with Wordpress and the robots.txt file

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

It’s funny to me that there seems to be so much foolishness around sitemaps and robots.txt files, especially when it comes to wordpress.

And this is vitally important to get this straight! If you are considering monetizing your wordpress site, you need to get your on-site SEO perfect!

Well, I’ve been using xml sitemap generator for some time and it works quite well with one exception. If you try to make the setting for robots.txt to display your sitemap location, well, it doesn’t work. Now, I don’t blame this on the sitemap plugin, but wordpress itself.

It seems that wordpress doesn’t create the dynamic robots.txt file. Maybe I have file permission issues, but I’m not sure.

And I want to be sure! And I want it to be automated! Why would I have to fool around and wonder if something as important as a robots.txt file be left to chance?

Now, the robots.txt file is important in that it is a file that is available to search engine robots as a starting point for reviewing and indexing your site’s information. If you have problems here, it could cause adverse issues. At worst you are not indexed properly. At best, you don’t rank as well.

Back to my story. Because I couldn’t properly depend on the robots.txt file to protect my site from getting same pages indexed as duplicate content and have it display the location of my sitemap.xml file, in an automated fashion, I’ve just skipped the sitemap.xml file listing.

After some reading, it seems that this is a fairly important component of onsite SEO. So, I had to find a better solution. Well, I believe that I have! Peter Coughlin on his website has created a magic plugin!

Take a cruise over to and check out his plugin, Robots.txt Wordpress plugin. Now, this is a simple but slick plugin. And does it work well!

Within 5 minutes of downloading it, I had it installed, activated, modified and running! Next, I check to see if the robots.txt file was been written. All I have to say is, two thumbs up!

There was one small change that I did make to the robots.txt file. Oh, didn’t I mentioned that you can actually modify your robots.txt within wordpress? Well now you can!

Below is the lines that I added near the bottom of the plugin configuration:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: /i/
Disallow: /f/
Disallow: /t/
Disallow: /c/
Disallow: /j/
Disallow: /*?

Now, I apologize, you’ll have to look and see what lines I’ve added. It took me about a minute to add the extra lines in the disallows.

My recommendation? Give it a try. Just make sure you’ve removed any manually made robots.txt file and then you should be golden!

So, Peter, 5 stars mate!

{ 13 comments… read them below or add one }

Peter Coughlin June 15, 2009 at 4:52 am

Hey Frank, glad you liked the plugin, and thanks for the review!

By the way – I’m now hosting the plugin on WordPress.org so you might want to check you’re using that version to get all the updates ..etc.

Thanks again,
Peter.

Pat June 21, 2009 at 10:45 am

I’m with you. I love the fact that I can go in and edit the lines of code. I’m also using XML-Sitemap but I’m not sure whether I should check or uncheck the setting: “Add sitemap URL to the virtual robots.txt file.” It’s the tiny note below the check box that’s confusing…
“The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!” I realize that both plugins utilize a virtual robots.txt file but will they conflict with one another? And should I not worry about actually creating a robots.txt file and placing it in the root level so that Google can find it. Right now Google cannot find it from what I see in Google Webmaster Tools. Please explain if you can. Thanks.

fthomas June 21, 2009 at 8:13 pm

Great question Pat,

You will want to uncheck that setting. Simply take a look at the listing that the robots.txt will contain from within the plugin’s page in the admin pages. You will see that it already has the sitemap location specified.

Also double check to make sure that there is not ‘real’ robots.txt present. If there is, simply rename it to robots.txt.old and see if that corrects the problem.

Frank

LoveGuru July 21, 2009 at 3:42 am

Hi, thanks for the link! I was looking for a plugin like this.

Lisa February 17, 2010 at 10:37 am

I am having that problem right now, google hasn’t properly crawled my site yet. I will uncheck it and should I remove the robots.txt file I created with my webserver?

fthomas February 17, 2010 at 10:40 am

Hi Lisa,

I’ve sent you off an email.

Frank

retro March 5, 2010 at 3:19 pm

Hey guy, I have a question, this robots.txt business’ got me all confused, see I installed wordpress on a separate directory but moved the index.php file to my main index, do I still need the dynamic robots.txt in the main directory or should I just rely on the virtual one in the blog’s directory?

fthomas March 5, 2010 at 4:06 pm

The robots.txt file and the sitemap.xml work hand in hand. I do believe that you need the robots.txt in the main directory but I’d double check that. Just be sure that the sitemap.xml that is called in the robots.txt is in it’s properly place too.

Check out the sitemap.xml and robots.txt file on this site. Just place it in the domain name at the end, http://www.usingwp.com/sitemap.xml and http://www.usingwp.com/robots.txt

Hope that helps!

Frank

retro March 7, 2010 at 1:59 am

thanks for the help, this is really confusing, i removed my real robots.txt file and the virtual one shows up when try to check it out in the regular domain/robots.txt format, i guess all i can do now is find a plugin that will allow me to edit that virtual file.

Webmaster support April 26, 2010 at 7:10 pm

Hi all my website is htttp://www. rajneesh .me/ and its has *removed* such kind of many urls indexed on google how can i stop them from indexing .I am also using robots.txt

Please help thanks in advance.

fthomas April 27, 2010 at 4:02 pm

Hi Rajnish,

I took a look at your site and everything seems in order. Once Google indexes a page, the best you can do to have it removed is to send a request to google to remove those pages. But I would recommend just putting forwards to pages that are not in your site, that are indexed and take advantage of the free traffic.

Frank

Webmaster support April 27, 2010 at 7:15 pm

Thanks for reply frank .

TechNobel June 18, 2011 at 7:08 pm

Hi, I have installed this plugin but I need to clarify some doubts.. I have the following the codes near the end:

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot
Allow: /

User-agent: Mediapartners-Google
Allow: /

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php

now my question is do you think anything is wrong with the above? cause I have checked it in Google Webmaster Tools and its shows that “Adsbot-Google”, “Googlebot”, “Mediapartners-Google” – “Allowed by line 2: Disallow:”

HELP!

Leave a Comment