Product Features

End User Features

  • Seamlessly search across RSS and Atom Web Feeds, HTTP(S), Filesystem and custom content
  • Automatically group search results into clusters (clustered search) for fast access to the right information
  • Advanced Search - Search by file format, language, keyword occurrence and modified date
  • Spelling Suggestions - Using words from indexed content
  • Date Range search - restrict search results to a particular date range
  • Automatic highlighting of user search query terms in HTML and PDF documents
  • Keyword-in-Context Display - search results are displayed with areas of content where the keyword occurs
  • User-defined number of search results per page
  • Simple and Advanced Query Syntax
  • Supports Boolean AND, OR, and NOT searches, Fuzzy and fielded searches
  • Browsable Categories for quick access to categorized content
  • Sort - search results can be sorted by date, relevance or alphabetically
  • Hit Highlighting - query terms are highlighted on content title and description
  • Collections - users can limit search to specific collections

Administrator Features

  • AJAX-based Admin Console - easy to use and intuitive console to manage all aspects of the Search application
  • Featured Results - Highlight links in the search results page when the user enters specific search terms [Enterprise Edition Only]
  • Web-based editor for easy customization of search results
  • Fast deployemnt of clustered search results using in-built clustering engine
  • Choice of Memory-Based Index (for very fast indexing) or Disk-Based Index (for large document collections)
  • Built-in Replication to synchronize search indexes across multiple instances of SearchBlox [Enterprise Edition Only]
  • Collections - create up to 250 document collections with customized settings
  • Look & Feel - search results customizable using CSS or XSLT stylesheets. Can also be delivered as XML
  • Automatic Generation of Browsable Categories using Category metadata in feeds and documents
  • Built-in Crawlers to index HTTP, HTTPS, File System, RSS and Atom Web Feed content
  • Built-in file serving of documents in File System Collections without URL mapping
  • Support for indexing content through Proxy Servers
  • Selective indexing of sections of HTML pages using <noindex> </noindex> or <!--stopindex--> <!--startindex--> tags
  • Protected Content - crawlers can index content protected with Basic HTTP and Form-Based Authentication
  • Reporting - real-time reporting with weekly, daily and hourly top queries and zero match queries for upto 3 months
  • On-Demand & Scheduled Indexing of content
  • Check for duplicate documents during indexing
  • Addition and Deletion of individual documents from the index
  • Disable stemming for individual indexes

Developer Features

  • REST-API - Simple platform independent API to add and delete custom content
  • Automatically parse and index documents accessible over HTTP or stored in the file system
  • Override document content and/or metadata at index time
  • Built-in browser-based SearchBlox Development Environment to ease development

Content Features

  • Supported File Types
    • HTML
    • Word
    • Excel
    • PowerPoint
    • PDF
    • Text
    • RTF
  • Supported Feed Formats
    • RSS (0.90, 0.91 Netscape, 0.91 Userland, 0.92, 0.93, 0.94, 1.0 and 2.0)
    • Atom 0.3
  • Supported Languages - can index content in 37 languages
    • Arabic
    • Bengali
    • Chinese(Simplified)
    • Chinese(Traditional)
    • Czech
    • Danish
    • Dutch
    • English
    • Estonian
    • Finnish
    • French
    • German
    • Greek
    • Gujarati
    • Hebrew
    • Hindi
    • Hungarian
    • Italian
    • Japanese
    • Kannada
    • Korean
    • Latvian
    • Lithuanian
    • Malayalam
    • Norwegian
    • Polish
    • Portuguese
    • Russian
    • Romanian
    • Slovak
    • Slovenian
    • Spanish
    • Swedish
    • Tamil
    • Telugu
    • Thai
    • Turkish
  • Stopwords - separate stopword list for each supported language
  • MetaTags - supports standard meta tag fields (title, description, keyword)