A brief introduction to distributed Elasticsearch

Source: Internet
Author: User

Here we explain some of the common terms, such ascluster (cluster),nodes (node)and theShard (Shard) will explore how to create your cluster (cluster) , node and Shard (shards)

A node is a Elasticsearch instance, and a cluster (cluster) consists of one or more nodes that have the same cluster.name . They work together to share data and load.

When a new node is added or a node is deleted, the cluster perceives and balances the data.

A node in the cluster is elected as the Master (Master), which temporarily manages some changes at the cluster level, such as creating or deleting indexes, adding or removing nodes, and so on. The master node does not participate in document-level changes or searches, which means when traffic is growing. The primary node does not become a bottleneck for the cluster. Any node can become the primary node. The cluster in our sample has only one node, so it acts as the primary node.

As a user. We can communicate with whatever node in the cluster. Contains the master node. Each node knows which node the document exists on, and they can forward the request to the corresponding node. The nodes we visit are responsible for collecting the data returned by each node. Finally, it is returned to the client together.

All this is handled by Elasticsearch.


Cluster Health

The ability to monitor statistics in a elasticsearch cluster is very much information. But only one is the most important: cluster Health (cluster). There are three states of cluster health: green , yellow or red .

GET /_cluster/health
here I can see:

{
   " cluster_name ":" Elasticsearch ",
    Status: "Yellow",
    "timed_out": false,
    "number_of_nodes": 1,
    "Number_of_data_nodes": 1,
    "Active_primary_shards": Ten,
    "Active_shards ": Ten,
   " Relocating_shards ": 0,
   " Initializing_shards ": 0,
   " Unassigned _shards ":,
   " Number_of_pending_tasks ": 0,
   " Number_of_in_flight_fetch ": 0
}

statusThe field provides a comprehensive indicator of the service state of the cluster.

The meanings of each of the three colors:

Color meaning
green All major shards and replicated shards are available
yellow All primary shards are available, but not all copy shards are available
red Not all major shards are available

The next step is to explain what is the primary shard (primary shard) and copy the shards (replica shard)and explain what these colors (states) mean in the real world.

< Span style= "line-height:25.6000003814697px" > In order to add data to Elasticsearch. We need index --a place where the associated data is stored.

As a matter of fact. An index is simply a "logical namespace (logical namespace)" that points to one or more shards (shards) .

A shard (shard) is a minimal level of "unit of work", which simply holds part of all the data in the index.

In the next chapter, "Deep Shards". We will specify how sharding works, but now we just need to know that sharding is an instance of Lucene, and that it is a complete search engine in itself.

Our documents are stored in shards. and are indexed in shards, but our applications do not communicate directly with them, instead, they communicate directly with the index.

Sharding is the key to elasticsearch data distribution in a cluster. Imagine a shard as a container of data. The document is stored in a shard, and then the shards are allocated to the nodes in your cluster. When your cluster is expanding or shrinking. Elasticsearch will proactively migrate shards between your nodes to keep the cluster balanced.

The Shard can be either a primary shard (primary shard) or a copy shard (Replica shard). Each document in your index belongs to a separate primary shard, so the number of primary shards determines how much data the index can store.

There is no limit to the amount of data the primary shard can store, and the limit depends on your actual usage. The maximum capacity of a shard depends entirely on your usage: the size of the hardware store, the size and complexity of the document, how to index and query your document, and the response time you expect.

Replication shards are just a copy of the primary shard, which prevents data loss due to hardware failures and can provide read requests at the same time. For example, search or retrieve documents from other Shard.

When the index is created. The number of primary shards is fixed, but the number of copied shards can be adjusted at any time.

The health state of the cluster yellow indicates that all primary shards (primary shards) are up and running properly-the cluster has been able to handle any request-but the replication shard (replica shards) is not yet fully available.

In fact, all three replication shards are now unassigned state-they have not yet been assigned to nodes. It is not necessary to save the same copy of the data on the same node, assuming that the node fails, and that all copies of the data are lost.

Now that our clusters are fully functional, there is still a risk of data loss due to hardware failures.

Executing on a single node means there is a risk of a single point of failure-no data backup.

Fortunately, to prevent a single point of failure. The only thing we need to do is start a node. the index of the document is first stored in the primary shard and then copied to the corresponding replication node.

This ensures that our data can be retrieved on both the primary and replication nodes.

In this way our cluster is not only fully functional, but also highly available.

How do we scale up as the application needs grow? Assuming we start the third node, our cluster will organize itself again.


The Shard itself is a complete search engine. It can use all the resources of a single node.

Suppose we have 6 shards (3 primary shards and three replication shards), can scale up to 6 nodes, each node has a shard, and each shard can 100% use this node's resources.



A brief introduction to distributed Elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.