Are You Using robots.txt?

Retro RobotWhat is it? robots.txt is a file that restricts bots from accessing specific pages of a site. Google’s bots, and those of other search engines, will look for a robots.txt file before they attempt to access a page. If they don’t find one, they crawl away. If they do, they typically respect the request to skip it. Remember, that is what this file is: a request. It is important that your robots.txt file be properly formatted so search engines can read your intent accurately.

Why use it? Webmasters use robots.txt when they want to exclude certain pages from Google’s searches. Why would they want to do this? You might, for instance, want to protect sensitive information that is online or, more commonly, you might have duplicate content (for instance, product information that appears on more than one page) that you do not want indexed.

What else do I need to know?

Just because you use robots.txt does not mean someone cannot find your pages. It means that Google won’t craw or index that content. The search engine may still index specific URLs if they can be found on other pages. This means that this information and other data, such as anchor text, may appear in search results. Google advises using a noindex metatag or x-robots-tag to prevent this from happening.

How do you implement Robots.txt? Unfortunately, Google’s robots.txt generating tool is no more. You can still create these files manually or with other tools. To DIY:

  • Create a plain text (NotePad) file entitled “robots.txt”.  When bots look for the robots.txt file in a URL, they look at the path component and replace it with robots.txt. So, if our domain www.seois great.com, the file would be placed as: www.seoisgreat.com/robots.txt.
  • In the file are two instructions:

User-agent:

Disallow:

  • User-agent indicates whether you want to give the same instructions for all bots. If you do want to, for instance, tell Google, Yahoo, Bing, etc., the same thing, you write: User-agent: *

If you want to specify the bots, you would write:  User-agent: Bingbots

This indicates that the request to ignore these pages are being made to Bing’s bots.

  • Disallow tells the bots which folders you would like them to ignore. For instance:  Disallow: /folder1

You can also choose to block entire sites,  images, directories, folders, and other pages or elements. If, for instance, you want to block your entire site from being indexed (which you probably don’t want to do!), you’d indicate:  Disallow: /

When you’ve created your robots.txt files, test them to ensure they are working properly. You can find a host of helpful tools online to help you.

Leave a Comment