ElasticSearch : Basics

Elasticsearch is a nosql, full text search engine library that is built on top of Apache Lucene. Its a real time document store where every field is indexed and searchable. Each record in ElasticSearch is a structured JSON document. It allows you to index millions of documents and give accurate search result in fraction of seconds. Its capable of scaling to hundreds of servers.

Installing Elasticseach

You need to have Java (jre/jdk any one will do)installed before you can use elasticsearch. Make sure you have added the java installation path to you environment variable – JAVA_HOME and path.

You can download elastic search from here. Simply download the zip file and unzip it . Elasticsearch is now ready to run. You can start it up by going to the bin directory running elasticsearch.bat/sh (according to your system). Test it out by opening following url in your browser.
http://localhost:9200/

es-bascis1

Lets change few config. Stop the elasticsearch server if running. Go to the config folder and open elasticsearch.yml. Lets change the cluster.name. Uncomment it and give a name of your choice. Then uncomment the node.name and give a suitable name. Run elasticsearch again from the bin and check http://localhost:9200/ . Your new settings will be reflected.

Although not in a true sence, elastic search terms can be roughly mapped to these database terms.

RDBMS Elasticsearch
Database Index
Table Type
Row Document
Column Field

When you save a data in elasticsearch you save it in an index. In terms of relational database , in index (in elasticsearch) is like a database. Index are then stored upon multiple shards, which are logical ways of slicing the data into individual chunks so that they can be stored upon multiple servers easily. Technically, a shard is an apache lucene instance. Shards are then stored in one or more servers which are called nodes and all these nodes form a cluster.

Lets install Kibana. Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. For now we will only de using its dev console to interact with elastic search. Download Kibana from the here. The setup is quite simple. Just unzip the downloaded file and edit the kibana.yml file in the config folder. Uncomment the elasticsearch.url (It should be http://localhost:9200) and run the kibana file in the bin directory. Now go to http://localhost:5601/ and click on the Dev Tools from the side navigation.

Lets create an index with a document

POST library/books
{
"title": "Learning Elastic search",
"publish_date":"18-03-2017",
"keywords" : "search elasticsearch lucene"
}

 

 

Above is the json that will create the schema in our index. Lets break it down. First we need an http POST to create an entity in elasticsearch. We have the node url (localhost:9200) and at the end of it is the name of the index library. Following it is the type books. For now we are directly adding a document to it. To check if the index is created use:

GET library/books/_search

GET _cat/indices?v

When an index is created, elasticsearch assigns promary shards. By default 5 shards are assigned. A shard replica is also created, by default 1 replica is assigned. Elasticsearch running on a cluster with multiple nodes are identified by the name. It is possible to run multiple elasticseach cluster in the same network. There are 4 types of designation to nodes but mostly used are 2 – Master nodes (master eligible nodes) and data node (salve nodes). We are running on our local machine, hence its a 1 node cluster (not efficient but OK for development purpose).

Shards represent the partitions that index is split into for scaling purpose. Shards are spread accross the nodes thus creating no single point failure. Elasticsearch automatically balances the shards in case of nodes failure. Note that shards require memory, so while working on single node cluster for development, you may set this default 5 to 1.

Replicas are simple the copy of primary shards. It takes over if the primary shard fails. Primary shards and replicas of it are never on the same node. Replicas shards can be searched , just as the primary shard.

Lets delete the previous created index using

DELETE library

Lets create again using setting parameters.

PUT /library
{
"settings":{
"number_of_shards":1,
"number_of_replicas":0
}
}

Check the inices again using GET  _cat/indices?v

Lets try to change replicas number:

PUT /library/_settings
{
"number_of_replicas": 2
}

As you can see it changed successfully. So number of replicas can be changed after the creation of the index. Now lets try to change the number of shards

PUT /library/_settings
{
"number_of_shards": 4
}

So we cannot change the number of shards after index creation.

Elasticsearch 5 made some significant changes than the previous versions. They uniformed all versions numbers for different add ons. Elasticssearch stack was formed by consolidating all associate projects. Its faster, uses less disk space etc.

Dont think elasticsearch as a replacement for RDBMS. Some of the differnces with RDBMS:

  • Although Apache lucene has transactions ( sequence of operation done in a one go), elasticsearch does not support it. Hence rolling back data to its previous state is not possible. In RDBMS this has been there since the begining.
  • RDBMS has usualy normalized data whereas elasticsearch has denormalized (redudundant data). With denormalize data, you eleminate the need of query joins, which are ofter time/resources consuming. However, denormalised data increases the requirement for storage space.
  • RDBMS are robust in a sence that , if a big query is taking too muc time/resouces it can kill it. Also if there is a our of memory error, it can handle it to a certain limit. While in elasticsearch there is no way to  cancel a query or handle out of memory erros.
  • Elasticsearch has no security feature of authentication/authorization as RDBMS provide like who can access which tables, or certain commands cannot be run by this user. There is an addon called Xpack on elasticsearch that can provide this.
  • Elasticsearch is a search engine optimized for search and retrieval. RDBMS is designed for many writes.
  • Elasticsearch provides full text search capability at a lighting fast speed compared to databases like postgres. It is usually good for write less and read more sort of a senarios like logs analysis using logsstash, time based aggregation, geo data points.

Elasticsearcg is most effectively used as in conjunction witha DB.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: