Blog

Search files stored on AWS S3 using SearchBlox

Posted by Timo Selvaraj on July 16, 2011

Excited to find a way to search through files stored on Amazon S3 using Lucene based SearchBlox. With a large number of companies and developers using S3 for storage, the need to search through the files stored on the cloud has emerged as a key requirement. SearchBlox provides an easy way to create an index of files on the local disk and search them. And s3fs provides a simple way to mount a S3 storage bucket to your server. With the combination of s3fs and SearchBlox, you have a very powerful, free and simple way to search through the files stored on S3. SearchBlox also provides an end user interface which allows you to setup your search and allow users to directly search them. Use Amazon’s Free Tier and setup your entire your store and search capability on the cloud. SearchBlox provides complete control over your index and search for free!

No Comments

SearchBlox Version 6.4 released

Posted by Robert Selvaraj on June 23, 2011

SearchBlox V6.4 is now available. This release has a few new features and some important bug fixes.

- SearchBlox can now automatically detect text files on the files system and index them irrespective of their file extensions. This has been a long standing feature request. You will now be able to use SearchBlox to search across repositories of text files such as source code files and log files. To exclude files with specific file extensions being indexed, the Disallow Filters can be used.

- The filename of the indexed document is now available as a separate tag <filename></filename> in the XML search results

- HTTPS indexing is now functional in the SearchBlox Server packages

- Issue with indexing of some MS Office documents is now fixed

- Foreign characters in search queries using the basic search form works correctly

No Comments

SearchBlox Vs Google Site Search – a comparison

Posted by Robert Selvaraj on June 14, 2011

If you are already using Google Site Search or considering using it, then you should consider using SearchBlox as your site search engine. It is fast, powerful and with the new cloud-hosted plans, requires no hardware/software deployment.

Searchblox gives you full control over indexing of your content – what should be indexed and when it should be indexed.

See how they compare:

Google Site Search SearchBlox Cloud
Number of indexes/collections One per license Unlimited
Page Indexing No control on which pages are indexed and when they are available for searching Full control. Indexed pages are immediately available for searching
Index password protected pages No Yes
On-demand indexing Yes. Pages indexed within 24 hours Yes. Pages indexed immediately
On-demand indexing quota Yes. Upto 10,000 pages for $2000 per year No. Unlimited.
On-demand indexing frequency Once every 24 hours No limitations. Can be used any number of times.
XML search results Yes Yes
Multi language Support in search results Yes Yes
Promote specific pages in search results Yes Yes
Customize look & feel of search results thru web interface Yes Yes
Price Variable depending on the number of search queries. Starts at $100 per year Fixed for a cloud plan. Starts at $29 per month
No Comments

Concept Search for Outlook PST archives including attachments – eDiscovery

Posted by Timo Selvaraj on March 4, 2011

SearchBlox 6.2 adds support for indexing and searching of Outlook PST archives including attachments. With the exponential growth of emails in the enterprise and the increase in legal and compliance requirements, finding the right information within your emails and attachments for your eDiscovery request is always challenging and expensive . Furthermore, SearchBlox is a comprehensive solution that searches across your email archives, file systems, websites and social media streams to monitor emerging concepts and hidden trends.

SearchBlox provides advanced search features including date range based searching, fuzzy searching, concept searching as also the standard boolean keyword based searching. Concept searching provides search results grouped by important concepts on a real time basis across your email messages, documents, social media rss feeds and websites to help you connect the dots between the various disparate sources.

If you are currently using another solution for eDiscovery, give SearchBlox a try since it does not cost you anything. SearchBlox is the fast, free and flexible way for eDiscovery across all your information.

Why is SearchBlox different from its competition :
- It’s FREE
- Outlook is not required for SearchBlox to index the PST files
- Create unlimited number of collections and each collection can have one or more pst files to index from your disk
- Index PST archive files in parallel to save time
- Index attachments for each email message and search them separately if required
- View the messages, attachments and headers right within SearchBlox
- Search for the information within headers
- Fielded searching to see search for specific senders or recipients

Interested in challenging us with your complex eDiscovery needs, contact us!

SearchBlox Concept Search with Emails

SearchBlox Email Viewer

Comments Off

SearchBlox WordPress Plugin now available

Posted by Timo Selvaraj on February 25, 2011

With WordPress becoming the website publishing platform of choice for a large number of web designers, SearchBlox provides an easy way to do a federated search of your posts, documents, rss feeds and/or any custom information. With the new WordPress SearchBlox plugin, you can connect your WordPress site to a SearchBlox search server and provide the ability to do Google like instant searches for not only your blog posts but also any information you like including external content like websites, rss feeds or files on a disk. SearchBlox provides advanced search features like concept searching for fast access to a large amount of information.

Follow the steps below to get started :
1.) Download and setup your free SearchBlox search server either on the same server or on the cloud
2.) Install the WordPress SearchBlox plugin
3.) Go to the SearchBlox admin section to let WordPress know where you have your SearchBlox server, the API key (which is found within the admin console of SearchBlox) and the collection name for indexing
4.) Click Index to make your pages searchable
5.) You can create a SearchBlox search page through the admin
6.) Search!

You can also create your own search form with our JQuery Instant Search Plugin or if you prefer a standard search, you can roll your own simple search form. If you choose to aggregate your blog posts with other information like indexing pdf files or providing a rss feed search of jobs or twitter messages, you can create a federated search page that combines all your information for one search across all of your internal and external content. Blog posts are added/updated or deleted as required for searching on a real time basis after your initial index of your site.

No Comments

Instant Search JQuery Plugin for SearchBlox

Posted by Timo Selvaraj on February 9, 2011

Install the Instant Search JQuery Plugin on any website and speed up the access to your information. This plugin makes it easy to change your ordinary search page into an instant search page with autocomplete. Did we say it is FREE to use?

Follow the steps below to use the instant search jquery plugin with SearchBlox :

1.) Download and Install SearchBlox or use our Amazon Cloud version
2.) After you setup the search collections, please turn on caching of search results within the admin section under the Results tab. (Please do not miss this step!!!)
3.) Download and unzip the SearchBlox Instant Search Plugin files.
4.) Edit the index.html and add the url for your SearchBlox installation.

That’s it!

Check out SearchBlox Instant Search to see how it looks!

No Comments

SearchBlox on AWS Elastic Beanstalk

Posted by Robert Selvaraj on January 19, 2011

This morning Amazon announced the availability of AWS Elastic Beanstalk – an environment that automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring of java applications. Under the hood, it uses AWS services such as EC2, Amazon S3, Amazon Simple Notification Service, Elastic Load Balancing, and Auto-Scaling.

We deployed the SearchBlox.war file on AWS Elastic Beanstalk and it works great! Here is a screenshot of SearchBlox deployed on Elastic Beanstalk.

This instance can be accessed at http://searchblox.elasticbeanstalk.com. This demo contains over 1200 pages indexed from CNN.com.

No Comments

How important is search analytics?

Posted by Timo Selvaraj on January 18, 2011

At the core of it’s function for a search engine, is the ability to connect a search query with the right result. Relevance being subjective to the user’s search query, a constant feedback loop is required to keep connecting the “dots” between the search term and the right document or page. I have always said that a search function which provides no insight into what users are looking for is pretty useless as you are essentially hoping that the user finds what they need, without actually knowing that they found what they needed. Taking it a step further, it is also important to understand what queries don’t yield any results so that we can check the content (SEO for your own website or intranet) to  ensure the right language is used as the users are not typically synced up with the same terminology the publisher uses. Once we are able to see what users are finding and clicking on or what is ending up with a “no results” page, it is a “discovery” process to do some of the following :

  • Type in the query to see what the users are seeing for atleast the top 10 search terms
  • See what queries are not yielding any results
  • See what collections or areas of intranet or website receive the queries which do or don’t provide the right results

    Based on your “discovery” process, you can see the what gaps exist in connecting the dots. SearchBlox provides the basic tenets to connecting the dots through the its reporting framework as also provides the ability to “feature” results to ensure the user connects with the relevant information they are looking for. At a minimum, a monthly review of the search analytics and adjustment of the featured results can help “connect the dots”.

    Do you have any search analytics best practices to share? Post them here!

    No Comments

    Monetizing your vertical search engine with advertising

    Posted by Timo Selvaraj on January 5, 2011

    Over the years we have had several vertical search engines that monetized content from either their own or aggregated sources using advertising. SearchBlox has been used by several customers as a vertical search engine for monetization creating a revenue stream for the operators. We have also created specific features to support keyword based targeting of text/graphic ads so that it is feasible for these operators to monetize. The Featured Results section within the admin console allows you to create a text or graphic ad for use with the featured_results_top/right xsl template. If you are interested in aggregating news feeds or job postings or any type of textual content for the purpose of monetization, SearchBlox enables you to do that out of the box!

    No Comments

    Searching RSS feeds using SearchBlox

    Posted by Timo Selvaraj on December 22, 2010

    Keeping track of multiple twitter account’s tweets through rss or making job posting feeds searchable for your website or simply aggregating news feeds from multiple websites. Here is a an easy way to index and make rss feeds searchable.

    1. Setup an RSS collection from the SearchBlox dashboard.

    2. Simply cut and paste one or more rss feed urls in tho the Feed url.

    3. Hit the Index button.

    SearchBlox will get the rss file and index each item in the file. It not only indexes each item within the rss file but also gets the url of the item and indexes the page making it searchable. You can also setup a schedule to retrieve the rss file and check for new items in the feed. SearchBlox indexes a variety of formats for feeds including rss and atom formats.  Search results from the feeds can be mixed with your standard website or custom search results.

    Quickly aggregate feeds from job sites or real estate listings or real time news using SearchBlox’s RSS Collection. This is much easier than you think through SearchBlox.

    No Comments

    The SearchBlox API

    Posted by Timo Selvaraj on December 21, 2010

    What does the SearchBlox API allow you to do?

    The REST based SearchBlox API allows you to add, update, delete a url and associated content in the search collection through a simple xml post in a language independent manner. For applications that require an easy way to make a document or url or any type of textual content searchable, we guarantee this is going to be faster than using Lucene or Solr from the ground up.

    What are the use cases where the API can be used?

    If you have a custom application or system where you would like to have complete control over what gets indexed and made searchable including the ability to combine multiple sources of textual content, the API can help you accomplish that very easily.

    #1 Indexing urls that your users have submitted or bookmarked or marked favorites

    All you need to do is make a simple http post to the url  http://localhost:8080/searchblox/api/rest/add with the following xml message

    <?xml version=”1.0″ encoding=”utf-8″?>
    <searchblox apikey=”16B93E58632880A80E0CE88F440981DB”>
    <document colname=”Custom_Collection” location=”http://www.searchblox.com/”>
    </document>
    </searchblox>

    That’s it! SearchBlox will retrieve the url http://www.searchblox.com/, index it and make it searchable instantly. You can override any of the fields by providing the field value. For example, if you want to override the title with “Hello World” instead of the original page title, you provide that value in the message.

    <?xml version=”1.0″ encoding=”utf-8″?>
    <searchblox apikey=”16B93E58632880A80E0CE88F440981DB”>
    <document colname=”Custom_Collection” location=”http://www.searchblox.com/”>
    <title>Hello World</title>
    </document>
    </searchblox>

    To delete the same document from the index, you would make a post to the delete url http://localhost:8080/searchblox/api/rest/delete with the xml message

    <?xml version=”1.0″ encoding=”utf-8″?>
    <searchblox apikey=”16B93E58632880A80E0CE88F440981DB”>
    <document colname=”Custom_Collection” location=”http://www.searchblox.com/”>
    </document>
    </searchblox>

    #2 Indexing custom content that is generated from multiple sources

    You can submit any of fields that SearchBlox allows without having a url or document provided to SearchBlox. For example, if you would like to index records from a custom data source with only title and description fields, then you can create the following xml message :

    <?xml version=”1.0″ encoding=”utf-8″?>
    <searchblox apikey=”16B93E58632880A80E0CE88F440981DB”>
    <document colname=”Custom_Collection”>
    <uid>1000</uid>
    <title>Hello World</title>
    <description>This is the description</description>
    </document>
    </searchblox>

    These results can be mixed with standard search results and are made available instantly. The documentation provides a detailed list of fields and values that are acceptable for indexing and searching using the API. The SearchBlox API masks the complexities of Lucene and lets you focus on your information that needs to be made searchable.

    No Comments

    Comparison of SearchBlox Vs Google Mini

    Posted by Timo Selvaraj on December 13, 2010

    SearchBlox is a great (free) solution for website search. A number of our customers have implemented search functionality on public websites using SearchBlox.

    One of the questions that we are often asked is – how does SearchBlox compare with Google Mini for website search? Here is a detailed comparison of features:

    SearchBlox Google Mini
    Number of documents supported Unlimited 300,000
    Price Free Price per server
    50k documents – $2,990
    100k documents – $3,990
    200k documents – $6,990
    300k documents – $9,990
    Support starts at $5,000 per organization 2 years of support included in appliance price.
    Requires purchase of new appliance after 2 years
    Open standards Yes. Based on Apache Lucene. No
    Search Results
    Relevance ranking Yes Yes
    Sort by date Yes Yes
    Sort by alphabetical order Yes. On document title. No
    Sort Order Ascending or descending
    for date, relevance and alphabetical sort
    Ascending or descending for date sort
    Query terms highlighted Yes Yes
    Dynamic summaries Yes Yes
    Highlighting PDF hits Yes No
    Promote specific pages in search results Yes Yes
    Crawling
    Crawl third party websites Yes No
    Index multiple files types including HTML, PDF and Office Yes Yes
    Languages suported 37 28
    Filter by file types, meta tags, websites Yes Yes
    Crawl password protected content Yes Yes
    Form-based authentication Yes No
    Duplicate removal Yes No
    Proxy server support Yes Yes
    Administration
    Web-based Admin Console Yes Yes
    Remote management Yes Yes
    Multiple Collections Yes Yes
    Add/Delete/Update specific URLs in real-time Yes No
    Search results XSLT customization Yes Yes
    Web based reports Yes Yes
    Full replication/mirroring Yes No
    Integration
    XML Search results output Yes Yes
    Indexing API Yes No
    Search Queries
    Spelling Suggestions Yes Yes
    Stemming Yes No
    Enable/disable stemming Yes No
    Customize stopwords Yes No
    Wildcards support Yes No
    Concept Search (Clustering) Yes No
    Search across collections Yes Yes
    Advanced search Yes Yes
    No Comments

    SearchBlox is now available as a FREE product with no limitations.

    Posted by Timo Selvaraj on November 12, 2010

    SearchBlox is pleased to announce the availability of SearchBlox Search Software as a completely FREE product. The product is now available with no limitations in terms of number of documents indexed and no restrictions in product functionality. SearchBlox will support the free product with a number of new paid support packages and free forum-based support. The paid support packages are designed to meet customers’ varied levels of support based on the type and number of search applications they develop and deploy.

    Why this change in business strategy?

    - With the explosive growth in content within organizations, per-document and per-server based pricing are no longer cost-effective for customers. With a free product strategy, SearchBlox will offer customers the ability to index and search unlimited number of documents across the organization for a fixed cost.

    - To drive large scale adoption of SearchBlox. Apache Lucene has grown in popularity in the last few years as a powerful open source search API. Since 2003, SearchBlox has pioneered the use of Apache Lucene as it core search technology, backed with excellent support. With the free product, SearchBlox aims to be the #1 provider of Enterprise Apache Lucene search solutions.

    We will continue to innovate with several exciting product features in 2011.

    Stay tuned!

    No Comments