By Timo Selvaraj
When it comes to big data search, your primary choices are between Solr and Elasticsearch. Both open source enterprise search platforms have the ability to perform full-text searches, faceted searches, and other similar requests not realistically handled by a typical Google search. SearchBlox uses Elasticsearch as its primary search engine for fast results that are easy to manage. The choice to use Elasticsearch rather than Solr comes from features such as distributed searches and improved scale. These are just some of the reasons why SearchBlox uses Elasticsearch.
SearchBlox is a popular choice for large scale searches because it is easy to manage and basically out of the box. Elasticsearch is a REST based search engine powered by the Lucene library. Major features include:
• Hit highlighting
• Faceted search
• Full-text search
• Database integration
• Rich document handling
• Dynamic clustering
• Distributed search and index replication
The Basic Differences
Before looking at the differences between Solr and ElasticSearch, let’s start with the similarities. Both platforms refer to a multiple servers connected together as a cluster. A single instance is a node. Now, for the differences:
Solr refers to the main logical data structure as The Collection, which is composed of many Shards.
A Collection can have an exact copy of the Shard, called a Replica.
You must develop a custom search component to index different document types.
An Index is the term used for the top logical data structure, which can have multiple Shards.
Lucene indices is the term for Shards and Replicas.
Allows multiple document types in a single Index, which allows you to index different index structures in one place.
Different types of documents can be separated and indexed when querying.
Solr requires a schema.xml file to define its index structure, fields, and types. ElasticSearch is schemaless, which means you can start indexing documents without the requirement of a schema. You still can use mapping to define your index structure though, which ElasticSearch uses when new indices are created. It will also make an attempt to create a field from a previously unseen field revealed when a document is indexed. This is an optional feature. In ElasticSearch, all configs are written to a configuration file. In Solr, all configs are defined in the solrconfig.xml file.
Zen Discovery is the term used for cluster management by ElasticSearch. This is where the master node is detected for the cluster. There is also a plugin called Apache Zookeeper that uses its own Zen Discovery. Solr, on the other hand, uses Apache Zookeeper ensemble. Zookeeper stores all configuration files to keep track of nodes and cluster states. A new node must be matched to a specific Zookeeper ensemble.
In conclusion, both Solr and ElasticSearch have their benefits. The differences between the two aren’t as stark as they were when both platforms debuted. SearchBlox has opted to go with ElasticSearch mainly because of ease of use, especially when it comes to configuration and cluster management. Solr is great for certain projects, but ElasticSearch really shines when you have a large number of documents to index.
See for yourself why SearchBlox with Elasticsearch makes a difference when it comes to performing a big data search. Find out more now!
Read our recent Blog posts