slideshare quotation-marks triangle book file-text2 file-picture file-music file-play file-video location calendar search wrench cogs stats-dots hammer2 menu download2 question cross enter google-plus facebook instagram twitter medium linkedin drupal GitHub quotes-close
Green code against a black background

Elasticsearch

Elasticsearch is an open-source, broadly distributable, readily scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, Elasticsearch can power extremely fast searches that support your data discovery applications.

Conventional SQL database management systems aren't really designed for full-text searches. They certainly don't perform well against loosely structured raw data that resides outside the database. On the same hardware, queries that would take more than 10 seconds using SQL will return results in under 10 milliseconds in Elasticsearch.

Elasticsearch vs Solr

There are a few differences in the way Solr and ElasticSearch name certain concepts. Let's start with the basics. Many servers connected together form a cluster for both ElasticSearch and Solr. A single instance of Solr or ElasticSearch is called a node. That's about it for nomenclature overlap.

The main logical data structure for Solr is called the Collection. A Collection is composed of Shards that are really Lucene indices. A single Collection can have multiple Shards and Shards can live on different Nodes. Because a Collection is composed of one or more Shards, a single Collection can be spread across multiple Nodes giving you a distributed environment. In addition to that, a Collection can have Replicas; essentially an exact copy of the Shard whose main purpose is to enable scaling and data duplication in case of Node failures (i.e., High Availability).

On the other hand, we have ElasticSearch where the top logical data structure is called an Index. Similar to a Collection in Solr, ElasticSearch Index can have multiple Shards and Replicas. And here, too, Shards and Replicas are small Lucene indices, that can be spread across the Cluster in order to create a distributed environment.

But that's not all.

In ElasticSearch you can have multiple Types of documents in a single Index. This means that you can index documents of different index structures (for example users and their documents) in a single Index. ElasticSearch is able to distinguish those Types during indexing as well as querying. To achieve the same with Solr you would have to simulate that inside your application or develop a custom search component.

Advantages:

  • Elasticsearch is distributed. It's not necessary to apply it in a separate project. Replicas are almost real-time.

  • Elasticsearch is fully compatible with real-time search with Apache Lucene.

  • Multi-Tenant management has not a special configuration, while Solr needs more advanced configuration.

  • Elasticsearch introduces the concept of the gateway, which facilitates full backups.

There are technologies with totally different user cases.

Apache Solr

Apache Solr offers tools from Lucene. It's easy to use and offers a fast server search with additional characteristics such as scale (and more).

Elasticsearch

An open-source code (Apache), distributed, RESTful, a search engine built-in Apache Lucene.

Amazon Cloudsearch

A service fully-managed in the cloud that allows easy integration and a powerful search engine that's fast and highly-scalable in different apps. It's based on Solr, which means it contains Solr in the backend.

Elasticsearch Module

Search API provides a framework for easily creating searches on any entity known to Drupal. Using any kind of search engine which implies it can use a bunch of connectors (you can find more info here).

We're going to talk about the 2 connectors which are mostly used for Solr and Elasticsearch.

Search API Solr is a module which provides a Solr backend for the Search API module

Elasticsearch Connector is a set of modules designed to build a full Elasticsearch ecosystem in Drupal (which is available here). It uses the official Elasticsearch PHP library. You can also view the GitHub repo.

Install & Requirements

Drupal 8

Search API module

Elasticsearch 1.3.0+

Steps needed to integrate the Elasticsearch with Drupal:

  1. Install the elasticsearch service and follow the instructions, found under "installation steps".

  2. Set up your Drupal site so it can talk to Elasticsearch

  3. Download the Elasticsearch connector module with the following command:

cd /path/to/drupal

composer require drupal/elasticsearch_connector

  1. Either install it with Drush using Drush en elasticsearch_connector or via the administration interface, as usual in Drupal 8 to install a new module

  2. Go to Configuration > Search and Metadata > Elasticsearch Connector and fill out the form to add a cluster

  3. Click Save and you can now proceed to configure a search index

Elasticsearch Conclusions

Solr and Elasticsearch are very similar search engines; both use the same search engine backend, ie Apache Lucene.

Solr is quite mature, more versatile and therefore is widely used.

Elasticsearch has been specifically developed to meet the shortcomings of Solr in terms of scalability requirements oriented in a modern cloud environments way, which are more difficult to deal with Solr.

It may be more useful to compare Elasticsearch engine with the recently introduced Amazon CloudSearch, because both cover the same use cases, however, Solr and Elasticsearch are quite similar. Maybe Elasticsearch is more "scalable" but in the end both are intended for the same purpose.

We pride ourselves on our development expertise. Get in contact if you need our services, we'd love to talk!

References

Drupal.org

https://logz.io/blog/solr-vs-elasticsearch

https://www.lullabot.com/articles/indexing-content-from-drupal-8-to-elasticsearch