I. Routing documents to shards
When you index a document, he is stored on a separate primary shard. The Elasticsearch is based on an algorithm to locate the Shard on which it resides.
Shard=hash (routing)%number_of_primary_shards
The routing value is an arbitrary string, which is _id by default but can also be customized. This routing generates a number by a hash function and then divides the number of primary slices to get an elm. This is why the number of primary shards can only be defined when the index is created and cannot be modified: if the number of primary shards changes in the future, all previous route values are invalidated and the document is never found.
All document APIs accept a routing parameter that defines the mapping of the document to the Shard.
Second, shard interaction
There are three nodes in the cluster, he contains an index called Bblogs and has two primary shards. There are two replication shards per primary shard. The same shard is no longer on the same node.
We can send requests to any node in the cluster and each node has the ability to handle any request. Each node is aware of the node on which any document resides. So you can also request forwarding to the desired node.
The best thing to do when we send a request is to loop through all the node requests so that the load can be balanced.
Iii. creating new indexes and deleting documents
Both the new index and the delete document are write operations, and they must be successfully completed on the primary shard to replicate to the related replication shards
Here are the steps to successfully create a new index and delete a document on a primary shard and a replicated shard:
①: Client sends new index or delete request to Node1
②: The node uses the _id of the document to determine that the document belongs to Shard 0. He forwards the request to Node3, Shard 0 is on this node
The ③:NODE3 executes the request on the primary shard, if it is. He's in the group. Requests to the corresponding replication nodes located on Node1 and Node2, when all the replication nodes report success, the NODE3 reports success to the requested node, and the requested node is then reported back to the client.
When the client receives a successful response, the modification of the document has been applied to the primary shard and all the replicated shards, and your modification takes effect.
Replication
The default value for replication is sync. This causes the primary shard to be returned after a successful response to the replicated shard.
If you set replication to async, the request will be returned to the client after it is executed on the primary Shard, and he will still forward the request to the replication node, but you will not know whether the replication node is successful or not.
Consistency:
The default primary shard requires a specified number (quorum) or more than half of the shards to be available when trying to write. To prevent data from being written to the wrong network partition.
Int ((PRIMARY+NUMBER_OF_REPLICAS)/2) +1
The consistency allows a value of over half a partition,
Timeout
When a fragmented copy is insufficient, Elasticsearch waits for more shards to appear, and the default waits a minute to set itself.
Iv. Retrieving documents
The document can be retrieved from either the primary shard or any one of the replicated shards
Here are the steps to retrieve a document from a primary shard or a replicated shard:
①: Client sends request to NODE1
②: The node uses the _id of the document to determine that the document belongs to Shard 0, and the replication shard corresponding to Shard 0 is available on three nodes. At this time He forwarded the request to Node2
③:node2 returns endangered to Node1 and returns to the client
V. Partial update of documents
Here are the steps for a partial update:
①: Client sends update request to NODE1
②: He forwards the request to the node where the primary shard resides Node3
③:NODE3 retrieves the document from the primary shard and modifies the JSON of the _source field. The index is then rebuilt on the primary shard. If there are other processes that have modified the document, he repeats step 3 with the number of times the retry_on_confluct is set, and it is not successful to discard
④: If Node3 successfully updates the document, he forwards the new version of the document to the replication node on Node1 and Node2 at the same time to rebuild the index, and when all the replication nodes report success, the NODE3 returns to the request node and then returns to the client
Six, bulk request
mget
And bulk
APIs are similar to separate documents. The difference is that the requesting node knows the Shard where each document resides. It splits a multi-document request into a document request for each shard, and then forwards each participating node.
Once the answer is received for each node, the responses are then collated into a single response, which is finally returned to the client.
The following are the request steps:
1. Customer point to Node1 send Mget request
2.NODE1 builds a multiple data retrieval request for each shard and forwards it to the primary shard or replication Shard required for those requests. When all replies are accepted, Node1 builds the response and returns it to the client
The routing parameter can be set by each document in docs
Below we will list the bulk
sequential steps used to perform multiple create
, index
, delete
and update
requests:
- The client
Node 1
sends the bulk
request.
Node 1
Bulk requests are built for each shard, and then forwarded to the primary shards required by those requests.
- Primary shards are sequentially executed one after the other. When an operation finishes, the primary Shard forwards the new document (or deletes the part) to the corresponding replication node, and then performs the next operation. The replication node reports all operations completed, the node reports to the request node, and the request node organizes the response and returns it to the client.
bulk
APIs can also be used at the top level replication
and consistency
parameters, and routing
parameters are used in each request's metadata.
Elasticsearch Introduction Series (vi) Distributed operations