Merkle Tree Algorithm detailed

Source: Internet
Author: User
Tags comparison hash

The Merkle tree is an algorithm used to synchronize data consistency in Dynamo, Merkle Tree is based on a data hash build. It has the following characteristics:

1, data structure is a tree, can be two fork tree, can also be multi-fork tree (this blog with two fork tree to analyze)

2. The value of the leaf node of the Merkle tree is the unit data of the data set or the hash of the unit data.

3, Merke tree non-leaf node value is the hash value of all its child nodes value.

To better understand, we assume that there are two machines A and B, a need to have the same directory with B 8 files, the files are F1 F2 F3 .... F8. This time we can make a quick comparison by Merkle tree. Suppose we build a merkle Tree for each machine when the file is created. Figure as follows:

From the above figure, it is known that the leaf node node7 value = Hash (F1), is the hash of the F1 file, and its father node Node3 value = hash (V7, V8), that is, its child node Node7 node8 worth of hash. This is how you represent a hierarchical operation relationship. The value of the root node is actually the only characteristic of the value of all leaf nodes.

If file 5 on a is not the same as on B. How do we find different files through the Merkle treee information of two machines? The comparison retrieval process is as follows:

1, first compare v0 whether the same, if different, retrieve their children Node1 and Node2.

2, V1 same, V2 different. Retrieving Node2 's child node5 Node6;

3, V5 different, V6 the same, retrieve the NODE5 child node 11 and node 12

4, V11 different, V12 the same. Node 11 is the leaf nodes and gets its directory information.

5. The search is comparatively complete.

The theoretical complexity of the above process is log (N). The actual process is greater than this complexity, because nodes of different values need to be compared by each child node. The process description diagram is as follows:


As you can see from the above picture, the exact process can quickly find the corresponding file.

If a machine is added to the directory under a file F9. The whole Merkle tree will turn into this:

The red font is the step that needs to be performed, the whole process is initiated from the leaf node, and goes back directly to the root node.


If the F1 under the directory is deleted. The graph of the operation of the whole tree is as follows:


A red font is an operation that needs to be performed.


It can be learned from the above that Merkle tree can improve the efficiency of calibration in large data sets. As can be seen from the Dynamo paper, the use of Merkle tree to synchronize the file and write operations of distributed nodes, especially in the case of service node anomalies, the details can be found in the Dynamo paper description.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.