robots.txt helps prevent unwanted content from search engines

SearchBlox is one of the few web crawlers that honor the robots.txt protocol to allow website owners to define which url patterns to be be disallowed from appearing in search results. A robots.txt file is a file that is placed at the root of your website and indicates those parts of your site you don’t want accessed by search engine crawlers. The file uses the Robots Exclusion Standard to indicate access to your website by section and by specific names of web crawlers or user agents.

This following text within the robots.txt tells all robots not to enter three directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

This example tells all robots to stay away from one specific file:

User-agent: *
Disallow: /directory/video.html

It is also possible to list multiple robots with their own specific rules for disallow. The below example demonstrates multiple user-agents and directives:

User-agent: googlebot # all Google services
Disallow: /news/ # disallow this directory

User-agent: googlebot-news # only the news service
Disallow: / # disallow everything

User-agent: * # any robot
Disallow: /video/ # disallow this directory

Contact us to learn more about managing your site search effectively and providing relevant search results to your users.