The instance displays the elasticsearch cluster ecosystem, shards, and horizontal scaling .,

Source: Internet
Author: User

The instance displays the elasticsearch cluster ecosystem, shards, and horizontal scaling .,

Elasticsearch is used to build highly available and scalable systems. Elasticsearch can provide better performance from more powerful hardware by purchasing better servers (vertical scaling) or more servers (horizontal scaling, however, vertical scaling also has some limitations. The real expansion should be horizontal. It will spread load and increase reliability by adding nodes. For most databases, horizontal scaling means that your program will make significant changes to use these newly added devices. In contrast, Elasticsearch is inherently distributed: it knows how to manage nodes to provide high scalability and high availability. This means that your program does not need to care about this. For most databases, horizontal scaling means that your program will make significant changes to use these newly added devices. In contrast, Elasticsearch is inherently distributed: it knows how to manage nodes to provide high scalability and high availability. This means that your program does not need to care about this.

Clusters and nodes 

Node is your Elasticsearch instance. A cluster is a group with the same cluster. name node set. They work together to share data and provide failover and expansion functions. When a new node is added or deleted, the cluster will perceive and balance the data. A node in the cluster is elected as the master node, which is used to manage some changes in the cluster, such as creating or deleting indexes, adding or removing nodes; of course, a node can also form a cluster.

Node communication: 

We can communicate with any node in the cluster, including the master node. Any node knows on which node the document exists and can forward the request to the node where the data is needed. The Communication Node collects the data returned by each node and returns the data to the client together. All of this is managed transparently by Elasticsearch.


Sharding and replica sharding
Shards are used for Elasticsearch to allocate data to your cluster. Imagine using parts as a data container. The documents are stored in shards and allocated to the nodes in your cluster.
When your cluster is expanded or reduced, Elasticsearch will automatically migrate shards between your nodes to maintain a balanced cluster.

A shard is a minimum "worker unit", which only stores a small part of all data in the index. our documents are stored and indexed in parts, but our programs do not know how to directly communicate with them. Instead, they communicate directly with the index. the shards in Elasticsearch are the primary shard and the replica shard. The replica Shard is only a copy of the primary shard. It is used to provide redundant copies of data and provide data protection after hardware failure, the Service also serves read-only requests such as search and retrieval. The quantity of primary shards and the number of replica shards can be configured through the configuration file. However, the number of primary slices can only be defined when an index is created and cannot be modified. The same slice is not placed on the same node.

1) sharding algorithm:

Shard = hash (routing) % number_of_primary_shards

The routing value is an arbitrary string, which defaults to _ id but can also be customized. This routing string generates a number through the hash function, divide by the number of Primary slices to get a remainder. The remainder ranges from 0 to number_of_primary_shards-1. This number is the part of a specific document.

This also explains why the number of Primary slices can only be defined when an index is created and cannot be modified: if the number of Primary slices changes in the future, all previous route values will become invalid, the document will never be found.

All document APIs (get, index, delete, bulk, update, and mget) receive a routing parameter, which uses mappings from custom documents to parts. Custom route values can ensure that all relevant documents, such as user articles, are routed by user account, so that documents belonging to the same user can be saved on the same shard.

2) interaction between parts and copies:

New, index, and delete requests are all write operations. They must be successfully completed on the primary shard before being copied to the relevant replica shard, next we will list the steps necessary to successfully create, index, or delete a document on the master and copy parts:

1. The client sends a new, index, or delete request to Node 1.

2. the node uses the _ id of the document to determine that the document belongs to part 0. It forwards the request to Node 3, and shard 0 is located on this Node.

3. Node 3 executes the request on the primary shard. If the request succeeds, it forwards the request to the corresponding replication nodes located at Node 1 and Node 2. When all the copied nodes report success, Node 3 Reports success to the requested Node, and the requested Node reports to the client.

When the client receives a successful response, the modification of the document has been applied to the master part and all the copy parts. Your modification takes effect.

3) Description of parameters related to the copy multipart copy operation:


The default value of replication is sync. This will result in a successful response from the primary Shard to the replica shard. If you set replication to async, the request will be returned to the client after the primary Shard is executed. It will still forward the request to the replication node, but you will not know whether the replication node is successful or not.

The default sync replication allows forced feedback transmission by Elasticsearch. Async replication may overload Elasticsearch by sending too many requests without waiting for other shards to be ready.


By default, a ** specified quantity (quorum) ** or more than half of the primary shard (which can be a master node or a replication node) is required when writing a primary shard. This prevents data from being written into the wrong network partition. The formula for calculating the specified quantity is as follows:

int( (primary + number_of_replicas) / 2 ) + 1

The value of consistency is one (only one primary shard), all (all primary shards and replica shards), or the default quorum or half-shard.

Note that number_of_replicas is used in the index to define the number of replica shards, rather than the number of active replication nodes. If you define that the index has three replication nodes, the specified number is: int (primary + 3 replicas)/2) + 1 = 3

However, if you only have two nodes, you cannot index or delete any documents because the number of active parts is not enough.

Note: The new index has one copy part by default, which means ** requires ** two active parts to meet the quorum requirements. Of course, this default setting will prevent us from performing operations in a single node cluster. To avoid this problem, the specified quantity takes effect only when number_of_replicas is later than the current value.


When the shard copy is insufficient, Elasticsearch waits for more shards to appear. Wait for one minute by default. If needed, you can set the timeout parameter to make it terminate earlier: 100 represents 100 milliseconds, and 30 s represents 30 seconds.


Cluster ecosystem:

1. nodes in the same cluster can be scaled up or down,

2. The number of primary shards is corrected after the index is created, but the number of replica shards changes at any time.

3. The same parts will not be placed on the same node.


Cluster health:

Elasticsearch clusters can monitor and measure a lot of information, but only one is the most important, cluster health ). Elasticsearch uses three color states: green, yellow, and red.

Green: All primary and replica shards are available.

Yellow: All primary shards are available, but not all replica shards are available.

Red: not all primary fragments are available;

1. Create a single cluster node

Our Single Point Cluster:

Create an index dobbyindex In the instance. by default, five primary shards are assigned to an index. In the instance, four primary shards and two replica shards are set (each primary shard corresponds to two replica shards ):

PUT /dobbyindex{  "settings": {    "number_of_shards": 4,    "number_of_replicas": 2  }}

Index after creation:

The storage of slices in the node es-node1 is as follows:

Our primary shards are all assigned to the es-node1. But our eight replica shards are not yet assigned to the node, and the cluster health status is as follows:

Cluster health: yellow (4 of 12)
The detailed information is:

{"Cluster_name": "elasticsearch-cluster-centos", "status": "yellow", "timed_out": false, "number_of_nodes": 1, "number_of_data_nodes": 1, "active_primary_shards": 4, "active_shards": 4, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 8}View Code

This means that all primary shards (primary shards) are started and run. The cluster can successfully accept any request, but the replica shards (replica shards) are not all available.

In fact, all eight replica shards are in the unassigned (unallocated) state, that is, they are not allocated to the node. It is unnecessary to save the same data copy on the same node, if the node fails, all data copies will be lost. Our cluster is now fully functional, but there is still a risk of data loss due to hardware faults.

2. Add failover

The cluster in the above instance has the risk of single point of failure, and there is no redundant data backup. We can expand nodes to prevent data loss. as long as the second node and the first node have the same cluster. name (elasticsearch-cluster-centos In the instance), which can automatically discover and add the cluster to the first node.

If no, check the log to find out what went wrong. This may be because the network broadcast is disabled or the firewall blocks node communication.

After we start the second node: The partition structure in the cluster is as follows:

Although four replica shards have been distributed to the es-node2 node, but according to our definition of the replica shard value is 2, there are four shards in the unsharded state, in this case, the health value of the cluster is still available for all primary shards, but not all replica shards. corresponding cluster health status:

Cluster health: yellow (8 of 12)

The detailed information is:

{"Cluster_name": "elasticsearch-cluster-centos", "status": "yellow", "timed_out": false, "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 4, "active_shards": 8, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 4}View Code

Therefore, we also need a node to shard these copies to make the cluster highly available, and then add the cluster nodes:

After the third node is started, the parts in the cluster are allocated effectively. es-node1 for this cluster ecosystem election out of the master (master), es-node2 and es-node3 for the cluster ecosystem in the slave (from ). in this way, some new indexed documents will be first stored in the primary shard and then copied to the associated replica node in parallel. This ensures that our data can be retrieved on both the master and replication nodes.

The cluster health status is as follows:

Cluster health: green (12 of 12)
The detailed information is:

{"Cluster_name": "elasticsearch-cluster-centos", "status": "green", "timed_out": false, "number_of_nodes": 3, "number_of_data_nodes": 3, "active_primary_shards": 4, "active_shards": 12, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0}View Code

Is a temporary graph captured during the sharding process when the node es-node3 is added.

3. Simulate node downtime and re-elect the cluster Master/Slave
In our master node for the es-node1, if the master node down, what will happen.

: Process number 7421 corresponding to the master node. The es cluster ecology has changed as follows ,:

The es-node3 is elected as the master node, the es-node2 for the slave node, the master parts and the copy parts are also changed, the master parts are placed on the es-node2, the copy parts are placed on the es-node3, because the shards are not fully allocated, the health status of the cluster changes to yellow (all primary shards are available, but not all replica shards are available), and then we restart the es-node1 node.

After the cluster is restarted, the health status is restored to green, but the master and slave nodes of the cluster have changed, and the location of the master Shard has also changed.


4. Simulate extended nodes

In instance 2, our cluster has reached the high availability status, corresponding index fragmentation. at this point we want to expand the cluster to continue adding nodes, what will happen to our sharding, then we will add another extension node es-node4.

After expansion, we can see that the slice has been re-sharded, the node es-node1 and the es-node3 respectively hold the master slice. Es-node2, es-node3, es-node4 hold replica parts, because the simulation process has master node down operation,

So we can see that the es-node4 in the new ecological cluster is the main node. The distribution information of each cluster storage is as follows:

In this state, the parts are also completely allocated, and green (all primary and replica parts are available ).


5. dynamically scale down or expand the number of copies

The number of replica nodes can be dynamically changed in the running cluster, which allows us to expand or reduce the scale as needed.

For example, we perform a scale-down operation:

PUT/dobbyindex/_ settings {"number_of_replicas": 1} execution result returned: {"acknowledged": true}

At this time, we can see that the information of the chip was adjusted again: the main Partition Distribution in the node es-node1, es-node3, es-node4. From the Partition Distribution in the es-node2, es-node3, es-node4.

For more information, see [].


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.