Reprinted from: http://blog.csdn.net/yuanrxdu/article/details/22474697
The Merkle tree is an algorithm used to synchronize data consistency in Dynamo, Merkle Tree is based on a data hash build. It has the following characteristics:
1, data structure is a tree, can be two fork tree, can also be multi-fork tree (this blog with two fork tree to analyze)
2. The value of the leaf node of the Merkle tree is the unit data of the data set or the hash of the unit data.
3, Merke tree non-leaf node value is the hash value of all its child nodes value.
To better understand, we assume that there are two machines A and B, a need to have the same directory with B 8 files, the files are F1 F2 F3 .... F8. This time we can make a quick comparison by Merkle tree. Suppose we build a merkle Tree for each machine when the file is created. Specific example:
From the can be known, the leaf node node7 value = Hash (F1), is the hash of the F1 file, and its father node Node3 value = hash (V7, V8), that is, its child node Node7 node8 worth of hash. This is how you represent a hierarchical operation relationship. The value of the root node is actually the only characteristic of the value of all leaf nodes.
If file 5 on a is not the same as on B. How do we find different files through the Merkle treee information of two machines? The comparison retrieval process is as follows:
1, first compare v0 whether the same, if different, retrieve their children Node1 and Node2.
2, V1 same, V2 different. Retrieving Node2 's child node5 Node6;
3, V5 different, V6 the same, retrieve the NODE5 child node 11 and node 12
4, V11 different, V12 the same. Node 11 is the leaf nodes and gets its directory information.
5. The search is comparatively complete.
The theoretical complexity of the above process is log (N). The actual process is greater than this complexity, because nodes of different values need to be compared by each child node. The process description diagram is as follows:
It is possible to find the corresponding file in the same process quickly.
If a machine is added to the directory under a file F9. The whole Merkle tree will turn into this:
The red font is the step that needs to be performed, the whole process is initiated from the leaf node, and goes back directly to the root node.
If the F1 under the directory is deleted. The graph of the operation of the whole tree is as follows:
A red font is an operation that needs to be performed.
It can be learned from the above that Merkle tree can improve the efficiency of calibration in large data sets. As can be seen from the Dynamo paper, the use of Merkle tree to synchronize the file and write operations of distributed nodes, especially in the case of service node anomalies, the details can be found in the Dynamo paper description.
Merkle Tree Algorithm detailed