Official elasticsearch Website: http://www.elasticsearch.org/
First, let's look at the overall framework of elasticsearch:
Elasticsearch is a distributed search framework developed based on Lucene and has the following features:
Distributed index and search
Automatic Index sharding and load balancing
Automatic Machine discovery and cluster creation
Supports restful APIs
Easy configuration.
It is a third-party plug-in management tool of elasticsearch. It clearly shows the index distribution of elasticsearch. You can see which one is distributed there and how much space is occupied, and you can manage indexes.
When a host is down, the entire system will re-allocate the content in the host to other machines. When the crashed host is re-added to the cluster, it will re-allocate the index to it. Of course, these rules can be set according to parameters and are flexible. Elasticsearch stores the index content in the memory first, and persists the index to the hard disk when the memory is insufficient. It also has a queue, the index is automatically written to the hard disk when the system is idle.
The following four backend storage methods are available:
1. indexes like common Lucene indexes are stored in local file systems;
2. stored in a distributed file system, such as freeds;
3. Stored in hadoop HDFS;
4. Stored in Amazon's S3 cloud platform.
It supports a variety of plug-ins. For example, the river plug-ins synchronized with MongoDB and couchdb, Word Segmentation plug-ins, hadoop plug-ins, and scripts support plug-ins.
The following describes several concepts of elasticsearch:
Cluster
A cluster has multiple nodes, one of which is the master node. The master node can be elected and the master node is for the inside of the cluster. One concept of ES is decentralization. Literally, it is a non-central node. This is for the outside of the cluster, because the elasticsearch cluster is logically a whole, communication with any node is equivalent to communication with the entire es cluster. You can configure the cluster name in the configuration file. machines in the same LAN with the same cluster name are automatically created without other special configurations.
Shards
Es can divide a complete index into Multiple shards. The advantage is that it can split a large index into Multiple shards and distribute them to different nodes, create a distributed search. The number of shards can only be specified before the index is created and cannot be changed after the index is created.
Replicas
An elasticsearch instance represents an index copy. elasticsearch allows you to set multiple index copies. Replicas improve system fault tolerance. When a shard of a node is damaged or lost, it can be recovered from the replica. The second is to improve the query efficiency of elasticsearch. elasticsearch automatically performs load balancing on search requests.
Recovery
This indicates data recovery or data redistribution. When a node is added or exited, elasticsearch redistributes the index shards Based on the server load. When the node is restarted, the data is also restored.
River
It represents a data source of ES, and is also a method for synchronizing data from other storage methods (such as databases) to es. It is an es service that exists as a plug-in. It reads data from the river and indexes it into es. The official river includes couchdb, rabbitmq, Twitter, and Wikipedia.
Gateway
Elasticsearch stands for the persistent storage mode of elasticsearch indexes. elasticsearch stores indexes in the memory by default, and persists to the hard disk when the memory is full. When the elasticsearch cluster is disabled and restarted, the index data is read from the gateway. Elasticsearch supports multiple types of gateways, including local file systems (default), distributed file systems, hadoop HDFS, and Amazon S3 cloud storage services.
Discovery. Zen
It represents the automatic discovery node mechanism of ES. Es is a P2P-based system. It first searches for existing nodes through broadcast and then communicates between nodes through multicast protocol, it also supports point-to-point interaction.
Transport
It represents the interaction between es nodes or clusters and clients. By default, TCP is used internally for interaction, and HTTP protocol (JSON format) is supported), thrift, Servlet, memcached, zeromq and other transmission protocols (integrated through plug-ins ).
Basic concepts of distributed search elasticsearch