Basic concepts of distributed search elasticsearch

Source: Internet
Author: User
Tags couchdb

Official elasticsearch Website: http://www.elasticsearch.org/

First, let's look at the overall framework of elasticsearch:



Elasticsearch is a distributed search framework developed based on Lucene and has the following features:

Distributed index and search

Automatic Index sharding and load balancing

Automatic Machine discovery and cluster creation

Supports restful APIs

Easy configuration.


It is a third-party plug-in management tool of elasticsearch. It clearly shows the index distribution of elasticsearch. You can see which one is distributed there and how much space is occupied, and you can manage indexes.


When a host is down, the entire system will re-allocate the content in the host to other machines. When the crashed host is re-added to the cluster, it will re-allocate the index to it. Of course, these rules can be set according to parameters and are flexible. Elasticsearch stores the index content in the memory first, and persists the index to the hard disk when the memory is insufficient. It also has a queue, the index is automatically written to the hard disk when the system is idle.

The following four backend storage methods are available:

1. indexes like common Lucene indexes are stored in local file systems;

2. stored in a distributed file system, such as freeds;

3. Stored in hadoop HDFS;

4. Stored in Amazon's S3 cloud platform.

It supports a variety of plug-ins. For example, the river plug-ins synchronized with MongoDB and couchdb, Word Segmentation plug-ins, hadoop plug-ins, and scripts support plug-ins.

The following describes several concepts of elasticsearch:

Cluster

A cluster has multiple nodes, one of which is the master node. The master node can be elected and the master node is for the inside of the cluster. One concept of ES is decentralization. Literally, it is a non-central node. This is for the outside of the cluster, because the elasticsearch cluster is logically a whole, communication with any node is equivalent to communication with the entire es cluster. You can configure the cluster name in the configuration file. machines in the same LAN with the same cluster name are automatically created without other special configurations.

Shards

Es can divide a complete index into Multiple shards. The advantage is that it can split a large index into Multiple shards and distribute them to different nodes, create a distributed search. The number of shards can only be specified before the index is created and cannot be changed after the index is created.

Replicas

An elasticsearch instance represents an index copy. elasticsearch allows you to set multiple index copies. Replicas improve system fault tolerance. When a shard of a node is damaged or lost, it can be recovered from the replica. The second is to improve the query efficiency of elasticsearch. elasticsearch automatically performs load balancing on search requests.

Recovery

This indicates data recovery or data redistribution. When a node is added or exited, elasticsearch redistributes the index shards Based on the server load. When the node is restarted, the data is also restored.

River

It represents a data source of ES, and is also a method for synchronizing data from other storage methods (such as databases) to es. It is an es service that exists as a plug-in. It reads data from the river and indexes it into es. The official river includes couchdb, rabbitmq, Twitter, and Wikipedia.

Gateway

Elasticsearch stands for the persistent storage mode of elasticsearch indexes. elasticsearch stores indexes in the memory by default, and persists to the hard disk when the memory is full. When the elasticsearch cluster is disabled and restarted, the index data is read from the gateway. Elasticsearch supports multiple types of gateways, including local file systems (default), distributed file systems, hadoop HDFS, and Amazon S3 cloud storage services.

Discovery. Zen

It represents the automatic discovery node mechanism of ES. Es is a P2P-based system. It first searches for existing nodes through broadcast and then communicates between nodes through multicast protocol, it also supports point-to-point interaction.

Transport

It represents the interaction between es nodes or clusters and clients. By default, TCP is used internally for interaction, and HTTP protocol (JSON format) is supported), thrift, Servlet, memcached, zeromq and other transmission protocols (integrated through plug-ins ).

Basic concepts of distributed search elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.