Distributed search solution selection 5 (final): elasticsearch

Source: Internet
Author: User
Tags solr

 

Finally, I found the distributed search framework of elasticsearch. As soon as I read it, I thought it was the same. Basically all the features I want include distributed search, distributed indexing, Zero Configuration, automatic partitioning, automatic indexing load, automatic discovery, and restful APIs. So I started to use it, deployed four machines, and imported the index. I set the partition to 3, that is, the index is divided into three parts, and the copy is 2, there are two complete indexes.

Its management tool clearly shows the Index Distribution: Where is the index, how much space is occupied, and how to manage the index. It is also found that when a host is down, the entire system will re-allocate the content in the host to other machines. When the crashed host is re-added to the cluster, it will re-allocate the index to it. Of course, these rules can be set according to parameters and are flexible. The search efficiency is tested, and the query time is basically about 200 milliseconds. The second search is similar to SOLR because of the cache. However, after detailed comparison and tests, it is found that the query performance of SOLR during index creation is very poor, because SOLR will produce Io blocking during index creation, resulting in a decline in search performance, but elasticsearch will not, because it first saves the index content to the memory, and persists the index to the hard disk when the memory is insufficient. At the same time, it also has a queue, the index is automatically written to the hard disk when the system is idle.

It can be stored in four ways: 1. A common Lucene index is stored in a local file system. 2. stored in a distributed file system, such as freeds. 3. Stored in hadoop HDFS. 4. Stored in Amazon's S3 cloud platform. It supports a variety of plug-ins. For example, the river plug-in synchronized with MongoDB couchdb, Word Segmentation plug-in, hadoop plug-in, and script support plug-in. It has a third-party SOLR interface simulation plug-in that allows you to directly switch your SOLR-based system to elasticsearch without changing the code. It is also a quasi-real-time search engine, the so-called real-time search engine means that when you index a document, you can search for this document immediately. So I decided to use this distributed search framework.

 

Postscript: I have also briefly learned about LinkedIn's zoie, which is also a quasi-real-time search framework. However, it does not support distributed search. Now LinkedIn has developed the zoie-based distributed search framework Sensei, this has not been studied. You can try it if you have time.

 

Comparison Evaluation of elasticsearch SOLR: http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
Official elasticsearch Website: http://www.elasticsearch.org/

References: http://www.searchtech.pro/articles/2013/02/18/1361194952868.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.