Technology

SearchBlox is engineered from ground up to be Java 2 Enterprise Edition (J2EE) compatible. The aim is to leverage the performance and the scalability capabilities of J2EE technologies and at the same time, enable SearchBlox users to take advantage of the J2EE software and skills that companies have already invested in. The SearchBlox Architecture consists of several sub-systems designed for optimal performance and scalability.

SearchBlox Architecture

AJAX-based Admin Console

The Admin Console enables administrators to configure and manage SearchBlox using just a browser. No more properties and configuration files to edit! Everything from creating new collections to setting the number of search results per page is done using the Admin Console. Access to Admin Console is protected so that only the administrator can change the settings. The Admin Console is built using AJAX, Java Server Pages (JSP) and Java Servlet technologies.

Crawler

SearchBlox has built-in filesystem, HTTP(S) and Feed crawlers. No additional configuration is required to index HTTPS content. In addition, the HTTP(S) crawler can index content protected with BASIC HTTP and Form-based Authentication. The filesystem and HTTP(S) crawlers support various filters which enable the administrator to specify which content to index and which content to exclude. The Feed Crawlers can automatically handle numerous feed formats. All the crawlers are optimized for performance and can index large amounts of content quickly and efficiently.

Clustered Search Results

SearchBlox has an in-built clustering engine that automatically clusters(groups) search results into dynamic categories. This enables users to access the right information quickly and easily. The clustering engine uses software developed by the Carrot2 Project.

Spell Checker

The Spell Checker offers spelling suggestions to the search user when the search query has misspelled words. The Spell Checker uses a spell index created automatically using words from the indexed documents.

Categorizer

The Categorizer automatically generates "Browsable Categories" of all the content available in the index. To do this, the Categorizer uses the Category metadata available in feeds and documents. Using "Browsable Categories", users can quickly access content they are most interested in.

View Browsable Categories

REST-API

The SearchBlox REST-API allows documents to be programmatically added to SearchBlox for indexing and searching and greatly simplifies integration of SearchBlox with other applications.This is achieved using simple HTTP POST and GET actions. The SearchBlox REST-API also allows custom content to be added to SearchBlox. To aid use of the SearchBlox REST-API, SearchBlox has an inbuilt browser-based SearchBlox Development Environment which allows developers to test the SearchBlox REST-API functionality without any coding.

The SearchBlox Development Environment can be accessed at http://yourhost/searchblox/sde/index.jsp of your SearchBlox deployment.

Reporting Engine

The real-time Reporting Engine gives administrators important information about what users are searching for. The reports include top search queries and zero-result queries for every collection for different time frames. The time frames can vary from the last 10 minutes to the previous 3 months. In addition, SearchBlox has detailed query logs that you can use for more detailed analysis.

XSLT Engine

The XSLT Engine allows complete customization of the search results to suit your requirements. To customize the search results, all you need to do is to create a new XSL stylesheet that meets your requirements ( or modify the default XSL we provide). Once configured, SearchBlox will use the new stylesheet when returning the search results. In addition, you specify which XSL stylesheet to use at runtime. SearchBlox can also return search results as XML. For the search query "searchblox",

Results using the default stylesheet

Results as XML

Lucene

Apache Lucene is a high performance Open Source Text Indexing and Search API written completely in Java. SearchBlox uses Lucene as its indexing and searching engine. Lucene uses powerful, accurate and efficient Search Algorithms that are at least as good as, if not better, than commercial search engines like Google. Lucene is actively developed by a dedicated community. Using Lucene enables SearchBlox to take immediate advantage of the latest developments in the field of search technologies.