Crawling a website with the SearchBlox API

SearchBlox makes it very simple to create a website search collection and setup site search for the web pages and documents with the faceted search page. With version 8.4 it is not now possible to programmatically create a website search collection and control the crawler to index the web pages for site search creation.

Use the following REST API calls from any language to setup a website search collection:

#1 Create a HTTP collection

{
"apikey" : "61282E82E5D6D8D409EFC87E8415CDAA",
"colname":"cnnsearch",
"coltype":"http"
}

The first API call creates a new http collection.

#2 Provide the root URL(s) for the collection

{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"rooturls": [
"http://www.cnn.com"
],
"allowpaths": [

".*"
],
"disallowpaths": [
"http://www.cnn.com/videos"
],
"allowformat": [
"HTML",
"text
]
}

The second API call provides the root url for the crawler and provides the domains to stay within during the crawling process.

#3 Index the website for search

{
"apikey": "61282E82E5D6D8D409EFC87E8415CDAA",
"colname": "httpcollection",
"action": "index"
}

The 3rd API call kicks off the crawling process.

Our HTTP REST API can be integrated into your CMS or website management application to completely manage the search process for websites, intranets and document folders.
Learn more about using the SearchBlox HTTP API to automate your website search.