Elasticsearch is a Lucene-based search server. It provides a distributed multi-user-capable full-text search engine, based on a restful web interface. Elasticsearch was developed in Java and published as an open source under the Apache license terms, and is the second most popular enterprise search engine. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use.
We build a website or application, and to add search functionality, what strikes us is that it is difficult to search for work. We want our search solution to be fast, we want to have a 0 configuration and a completely free search mode, we want to be able to simply use JSON indexed data via HTTP, we want our search server to always be available, we want to be able to start one and expand to hundreds of, we want to search in real time, We want simple multi-tenancy and we want to build a cloud-based solution. Elasticsearch is designed to solve all these problems and more.
Elasticsearch is a new member of the open source search platform, the real-time data analysis artifact, developed rapidly, based on Lucene, RESTful, distributed, cloud-oriented design, real-time search, full-text search, stability, high reliability, extensible, installation + easy to use, introduction are said to be very pleasant, Good to take out for a walk.
Did a simple test, in two identical virtual machines, 20 million or so data, Elasticsearch inserted data speed than MongoDB much slower (can endure), but search/query faster than 10 times times, this is only a single case, multi-machine cluster case Elasticsearch a better performance. The following installation steps are completed on Ubuntu Server 14.04 LTS.
Installing Elasticsearch
After upgrading the system, install Oracle Java 7, since Elasticsearch officially recommends using Oracle JDK 7, do not try JDK 8 and OpenJDK:
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa: webupd8team / java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
Install Elasticsearch after joining the official Elasticsearch source:
$ wget -O-http://packages.elasticsearch.org/GPG-KEY-elasticsearch | apt-key add-
$ sudo echo "deb http://packages.elasticsearch.org/elasticsearch/1.1/debian stable main" >> /etc/apt/sources.list
$ sudo apt-get update
$ sudo apt-get install elasticsearch
Add to the system startup file and start the elasticsearch service, use curl to test whether the installation is successful:
$ sudo update-rc.d elasticsearch defaults 95 1
$ sudo /etc/init.d/elasticsearch start
$ curl -X GET 'http: // localhost: 9200'
{
"status": 200,
"name": "Fer-de-Lance",
"version": {
"number": "1.1.1",
"build_hash": "f1585f096d3f3985e73456debdc1a0745f512bbc",
"build_timestamp": "2014-04-16T14: 27: 12Z",
"build_snapshot": false,
"lucene_version": "4.7"
},
"tagline": "You Know, for Search"
}
Elasticsearch's cluster and data management interface Marvel is very good. Unfortunately, it is only free for the development environment. If this tool is also free, it is invincible. The installation is very simple. Restart the service after completion. You can see the interface:
$ sudo / usr / share / elasticsearch / bin / plugin -i elasticsearch / marvel / latest
$ sudo /etc/init.d/elasticsearch restart
* Stopping Elasticsearch Server [OK]
* Starting Elasticsearch Server [OK]
Install the Python client driver
Like MongoDB, we generally use programs to interact with Elasticsearch. Elasticsearch also supports client drivers in multiple languages. Only the Python driver is installed here. For other languages, you can refer to the official documentation.
$ sudo apt-get install python-pip
$ sudo pip install elasticsearch
Write a simple program to import the data of gene_info.txt into Elasticsearch:
#! / usr / bin / python
#-*-coding: UTF-8-*-
import os, os.path, sys, re
import csv, time, string
from datetime import datetime
from elasticsearch import Elasticsearch
def import_to_db ():
data = csv.reader (open ('gene_info.txt', 'rb'), delimiter = '\ t')
data.next ()
es = Elasticsearch ()
for row in data:
doc = {
'tax_id': row [0],
'GeneID': row [1],
'Symbol': row [2],
'LocusTag': row [3],
'Synonyms': row [4],
'dbXrefs': row [5],
'chromosome': row [6],
'map_location': row [7],
'description': row [8],
'type_of_gene': row [9],
'Symbol_from_nomenclature_authority': row [10],
'Full_name_from_nomenclature_authority': row [11],
'Nomenclature_status': row [12],
'Other_designations': row [13],
'Modification_date': row [14]
}
res = es.index (index = "gene", doc_type = 'gene_info', body = doc)
def main ():
import_to_db ()
if __name__ == "__main__":
main ()
Kibana is a powerful data display client. It is integrated with Elasticsearch through a plug-in method. Installation is easy. Download and decompress it. Then restart the Elasticsearch service and visit http://192.168.2.172:9200/_plugin/kibana/ Can see the interface:
$ wget https://download.elasticsearch.org/kibana/kibana/kibana-3.0.1.tar.gz
$ tar zxvf kibana-3.0.1.tar.gz
$ sudo mv kibana-3.0.1 / usr / share / elasticsearch / plugins / _site
$ sudo /etc/init.d/elasticsearch restart