By Timo Selvaraj
SearchBlox is one of the few web crawlers that honor the robots.txt protocol to allow website owners to define which url patterns to be be disallowed from appearing in search results. A robots.txt file is a file that is placed at the root of your website and indicates those parts of your site you don’t want accessed by search engine crawlers. The file uses the Robots Exclusion Standard to indicate access to your website by section and by specific names of web crawlers or user agents.
This following text within the robots.txt tells all robots not to enter three directories:
This example tells all robots to stay away from one specific file:
It is also possible to list multiple robots with their own specific rules for disallow. The below example demonstrates multiple user-agents and directives:
User-agent: googlebot # all Google services
Disallow: /news/ # disallow this directory
User-agent: googlebot-news # only the news service
Disallow: / # disallow everything
User-agent: * # any robot
Disallow: /video/ # disallow this directory
Contact us to learn more about managing your site search effectively and providing relevant search results to your users.