/* Recently looking at Ethereum, one of the important concepts is Merkle Tree, has never heard of before, so looked up some information, learning Merkle tree knowledge, because the contact time is not long, the understanding of Merkle tree is not very deep, if there is wrong place, I hope you will correct me. * * Merkle tree Concept
Merkle tree, often referred to as hash tree, is, as its name implies, the one that stores the hash value. The leaf of the Merkle tree is the hash value of the data block (for example, a collection of files or files). A non-leaf node is a hash of its corresponding child node concatenated string. [1]
1. Hash
A hash is a function that maps arbitrary-length data into fixed-length data [2]. For example, for data integrity check, the simplest way is to hash the whole data to get a fixed length of hash value, and then the resulting hash value published on the net, so that users downloaded to the data, the data again hash, compare the results of the calculation and the online published hash value to compare, If the two hash values are equal, the downloaded data is not corrupted. This can be done because a slight change in the input data will cause the hash result to be unrecognizable, and it is difficult to reverse the characteristics of the original input data according to the hash value. [3]
If you download from a stable server, it is advisable to use a single hash. But if the data source is unstable, once the data is corrupted, it needs to be re-downloaded, the efficiency of this download is very low.
2. Hash List
Data is downloaded from multiple machines at the same time in a point-to-point network, and many machines can be considered unstable or untrustworthy. In order to verify the integrity of the data, a better way is to divide the large files into small chunks (for example, a block of data divided into 2 K units). The advantage of this is that if the small piece of data is damaged during transmission, it is only necessary to re-download the fast data, without having to re-download the entire file.
How to determine if the small data block is not damaged. You only need to hash each block of data. When BT downloads, we will download a hash list before downloading to the real data. So the question comes again, how to make sure this hash list is right. The answer is to put the hash value of each small piece of data together, and then the long string in a hash operation, so that the hash list of the root hash (Top hash or root hash). When the data is downloaded, the correct root hash is obtained from the trusted data source, which can be used to verify the hash list, and then verify the data block by verifying the hash list.
3. Merkle Tree
The Merkle tree can be seen as a generalization of the hash list (a hash list can be seen as a special Merkle tree, which is a multi-fork Merkle trees with a height of 2).
At the very bottom, as with the hash list, we divide the data into small chunks of data that have a corresponding hash and it corresponds to it. But go up, not directly to the operation of the root hash, but instead of the two adjacent hashes into a string, and then the hash of the string, so that every two hashes to marry a child, get a "sub-hash." If the lowest total number of hashes is singular, then there must be a single hash at the end, which is directly hashed, so it can also get its sub-hash. So upward push, still the same way, can get the number of new first-level hash, eventually inevitably form a tree upside down, to the root of this position, this generation left a root hash, we call it Merkle root[3].
The Merkle tree root of a file is obtained from a trusted source before the network is downloaded from a peer-to-web network. Once the root is obtained, Merkle tree can be obtained from other sources that are never trusted. Check the received Merkle tree with a trusted root. If the Merkle tree is corrupt or false, get another Merkle tree from another source until you get a merkle tree that matches the trusted root tree.
The main difference between the Merkle tree and the hash list is that you can directly download and immediately verify a branch of the Merkle tree. Because the file can be cut into small chunks of data, so if there is a piece of data corruption, just re-download the data block on the line. If the file is very large, then the Merkle tree and hash list are very good, but Merkle tree can download one branch at a time, then immediately verify the branch, and if the branch validation passes, it can download the data. The hash list can only be verified by downloading the entire hash list.
features of the Merkle tree
Mt is a tree, most of which is a two-fork tree, can also be multi-fork tree, whether it is a few fork tree, it has all the characteristics of the tree structure; the value of the leaf node of the Merkle tree is the unit data of the data set or the hash of the unit data. The value of the non-leaf node is based on all of the leaf node values below it, and is then calculated according to the hash algorithm. [4] [5]
In general, cryptographic hashing methods like SHA-2 and MD5 are used to make hash. But if the data is not intentionally corrupted or tampered with, you can use some low-security but efficient checksum algorithms, such as CRC.
The roots of the Second preimage attack:merkle tree do not represent the depth of the trees, which can lead to Second-preimage Attack, where an attacker creates a false document with the same Merkle root. A simple workaround is defined in certificate transparency: When calculating the hash of a leaf node, add 0x00 before the hash data. When calculating the internal node is, precede with 0x01. Other implementations limit the root of a hash tree by prefixing the hash value with a depth prefix. Therefore, each step of the prefix is reduced, and the extracted hash chain is defined as valid only if the prefix is still positive when the leaf is reached. operation of the Merkle tree
1. Create Merckle Tree
There are 9 data blocks at the bottom of the join.
Step1: (red line) hash the data block, node0i = hash (data0i), i=1,2,..., 9
Step2: (Orange Line) adjacent to two hash blocks in series, then do hash operation, Node1 ((i+1)/2) = hash (node0i+node0 (i+1)), i=1,3,5,7; for I=9, Node1 ((i+1)/2) = hash ( node0i)
Step3: (Yellow line) Repeat Step2
Step4: (Green Line) Repeat Step2
STEP5: (Blue Line) Repeat Step2, generate Merkle Tree Root
Easily, create Merkle tree is O (n) complexity (here refers to the O (n) hash), n is the size of the data block. The tree height that gets merkle trees is log (n) +1.
2. Retrieving data blocks
To better understand, we assume that there are two machines A and B, a needs to have 8 files in the same directory as B, the files are F1 F2 F3 ... f8. This time we can make a quick comparison by Merkle tree. Suppose we build a merkle Tree for each machine when the file is created. Figure as follows:
From the above figure, it is known that the leaf node node7 value = Hash (F1), is the hash of the F1 file, and its father node Node3 value = hash (V7, V8), that is, its child node Node7 node8 worth of hash. This is how you represent a hierarchical operation relationship. The value of the root node is actually the only characteristic of the value of all leaf nodes.
If file 5 on a is not the same as on B. How do we find different files through the Merkle treee information of two machines? The comparison retrieval process is as follows:
Step1. First compare whether V0 is the same, if different, retrieve their children Node1 and Node2.
Step2. V1 Same, V2 different. Retrieving Node2 's child node5 Node6;
Step3. V5 different, V6 same, retrieve NODE5 child node 11 and node 12
Step4. V11 is different, V12 is the same. Node 11 is the leaf nodes and gets its directory information.
Step5. The search is comparatively complete.
The theoretical complexity of the above process is log (N). The process description diagram is as follows:
As you can see from the above picture, the exact process can quickly find the corresponding file.
3. Update, INSERT and delete
Although there is a lot of information on Merkle tree on the web, most of it does not involve the update, insert, and delete operations of Merkle tree, and discusses the Merkle tree's retrieval and traversal more. I am also very confused, the operation of a tree structure must include not only the search, but also update, insert and delete AH. Later found a problem on the Stackexchange, only a little bit clear, the original see [6].
The update operation for the Merkle tree block is actually very simple, updating the data block and then updating its hash value on the root path will not change the structure of the Merkle tree. However, the insert and delete operations will certainly change the structure of the Merkle tree, as shown in the following diagram, where an insert operation is:
After inserting data block 0 (consider the location of the data block), the structure of the Merkle tree is this:
The students in [6] Consider an insertion algorithm that satisfies the following conditions:
The number of re-hashing operations is controlled within log (n) within the data block check within log (n) +1 unless the original tree n is an even number, the tree after inserting the data is not orphaned, and if there is an orphan, then the orphan is the last chunk of data block the order of the block is consistently inserted after the Merkle Tree maintains balance
Then the result of the above insert is this:
According to the respondents in [6], the insertion and deletion of Merkle tree is actually an engineering problem, and different problems will have different insertion methods. If you want to make sure that the tree is balanced or that the tree height is log (n), you can use any of the standard balanced binary tree patterns, such as AVL tree, red-black tree, stretch tree, 2-3 tree, etc. These balanced binary tree update modes can be inserted in O (LGN) time, and can guarantee that the tree height is O (LGN). Then it is easy to see that updating all merkle hashes can be done in O ((LGN) 2) time (for each node to be updated from it to Tree root o (LGN) nodes, and O (LGN) nodes need to be updated in order to meet the tree height requirements). If analyzed carefully, updating all of the hashes can actually be done in O (LGN) time, because all the nodes to be changed are associated, that is, if they are either from a leaf node to a path on the tree root, or this is similar.
[6] The respondents said that in fact the structure of the Merkle tree (whether it is balanced, the height of the tree is limited) is not important in most applications, and the order of data blocks is not required in most applications. As a result, you can design your own insert and delete operations based on the specific application situation. A generic merkle tree insert delete operation is meaningless. Application of Merkle tree
1. Digital signature
The original Merkle tree was designed to efficiently handle Lamport one-time signatures. Each Lamport key can only be used to sign a message, but a combination with Merkle tree can be signed with multiple Merkle. This method has become an efficient digital signature framework, namely Merkle Signature Scheme.
2. Peer Network
The Merkle tree is used to ensure that blocks of data received from other nodes are not corrupted and not replaced, and even checking that other nodes do not spoof or publish false blocks. We are familiar with the BT download is the use of peer technology to enable data transmission between the client, one can speed up the download speed, and reduce the burden of download server. BT is BitTorrent, a central index-to-peer file Analysis Communication protocol [7].
To get into the download you must obtain an index file with an extension of torrent from the central Index Server (that is, the seed that you say), and the torrent file contains the information to share, including the file name, size, hash information for the file, and a url[8 that points to tracker. The hash information in the torrent file is a cryptographic summary of the contents of each file to be downloaded, and these summaries can also be run for verification at download time. A large torrent file is a bottleneck for Web servers and cannot be directly included in RSS or gossiped around (spread with rumors spread protocol). A related problem is the use of large chunks of data, because in order to keep the torrent file very small, the number of hash blocks is small, which means that each chunk is relatively large. Large chunks of data affect the efficiency of trading between nodes, because only large chunks of data are downloaded and verified to be able to trade with other nodes.
To solve the above two problems is to use a simple merkle tree instead of a hash List. Design a layer of enough two fork tree, leaf node is the hash of the data block, the insufficient leaf node is replaced by the zero. The upper node is the hash of its corresponding child node concatenation. The hash algorithm uses the same SHA1 as the ordinary torrent. The data transfer process is similar to the one described in the first section.
3. Trusted Computing
Trusted computing is a trusted computing group that provides endpoint credibility for a computing platform that participates in nodes in a distributed computing environment. Trusted computing Technology introduces a trusted Platform Module (Trusted PLATFORM,TPM) to the hardware layer of the computing platform, which actually provides a hardware-based trusted root (root of Trust,rot) for the computing platform. From the trusted root, using the trust chain transfer mechanism, trusted computing technology can measure the hardware and software level of the local platform, and reliably save the measurement result to the Platform configuration register (Platform configuration REGISTER,PCR) of the TPM. The remote computing platform can then verify the trustworthiness of the on-premises computing platform by measuring the results in local PCR using Remote authentication mechanism (attestation). [10] Trusted computing allows participating nodes of distributed applications to get rid of their reliance on the central server and build trust directly from the TPM chip on the user's machine, making it possible to create secure distributed applications that are more scalable, more reliable, and more available. The core mechanism of trusted computing technology is Remote authentication (attestation), and the participation node of distributed application is to establish mutual trust through the remote authentication mechanism to ensure the security of the application.
A remote authentication mechanism based on Merkle tree is proposed in [10], and its core is the integrity measure value hash tree.
First, Ramt is no longer an integrity measure list (ML) maintained in the kernel, but an integrity measure hash tree (integrity measurement hash tree, abbreviated IMHT). where, The data objects stored by the leaf nodes of the imht are the integrity hashes of the various programs that are measured on the computing platform to be validated, and their internal nodes are dynamically generated based on the hash value of the connection of the Merkle hash tree's construction rule.
Second, in order to maintain the integrity of the imht leaf nodes, Ramt needs to use a piece of memory in the TPM to hold the value of the IMHT trusted root hash.
Again, the Ramt integrity verification process is implemented based on the authentication path (authentication path). The authentication path is the path from the leaf node to the root hash on imht.
4. IPFs
IPFs (Interplanetary File System) is a complex of many NB Internet technologies, such as DHT (distributed HashTable, distributed hash table), Git version control system, BitTorrent, etc. It creates a cluster of peers that allows the exchange of IPFs objects. All of the IPFs objects form a cryptographic authentication data structure called the Merkle dag.
The IPFs object is a data structure with two fields: data– binary data of unstructured size less than 256kB links– an array of link data structures. IPFs objects through which they are linked to other objects
The link data structure contains three fields: the name of the Name–link hash–link the Hash size–link linked to the object, including its links to the cumulative size of the object
A Merkle DAG (directed acyclic graph) is composed of a collection of name and LINKS,IPFS.
For small files (<256kb), it is a IPFs object without links.
For large files, it is represented as a collection of file blocks (<256KB). Only objects with the smallest data represent this large file. The name of the links for this object is an empty string.
Directory structure: A directory is a IPFs object with no data, and its links to the files and directories it contains.
IPFs can represent the data structure that Git uses, and Git commits object. The main feature of Commit object is that he has one or more links called ' parent0 ' and ' parent1 ' (these links point to the previous version), and an object (which becomes a tree in git). Point to the file system structure that references this commit.
5. Bitcoin and Ethereum[12][13]
The earliest application of Merkle proof was bitcoin, which was described and created by the Nakamoto in 2009. Bitcoin's blockchain uses Merkle proofs to store transactions for each chunk.
The benefit of this is the concept of "simplifying payment validation" (simplified Payment VERIFICATION,SPV) described in Nakamoto, a "light client" Client) You can download only the chunk header of the chain, which is the 80byte block of data in each chunk, with only five elements, rather than downloading every trade and each chunk: the hash value of the block of the previous block time stamp mining difficulty value work proof random number (nonce) contains the Merkle of the block transaction Root hash of tree
If the client wants to confirm the status of a transaction, it simply initiates a merkle proof request, which shows that the particular transaction is in one of Merkle trees, and that the root of the Merkle tree is in a chunk header of the main chain.
But Bitcoin's light client has its limitations. One limitation is that although it can prove the involved transaction, it cannot carry on proof of the current state (e.g. holding of digital assets, name registration, status of financial contracts, etc.).
Bitcoin how to find out how many coins you currently have. A bitcoin light client can use a protocol that involves querying multiple nodes, and believes that at least one node will notify you about any particular transaction expense in your address, and this allows you to implement more applications. But for other more complex applications, these are far from enough. The exact nature of the impact of a deal (precise nature) can depend on the previous transactions, which in itself depend on the more previous trades, so you can eventually verify every trade on the chain. In order to solve this problem, the concept of Ethereum Merkle tree will be further.
Ethereum's Merkle Proof
Each ethereum chunk header is not a merkle tree, but a three tree designed for three objects: Transaction transaction receipt receipts (essentially a multi-block of data showing the impact of each trade) state
This makes it possible to have a very advanced light client protocol that allows light clients to easily and verify the following types of query answers: Is this transaction included in a specific chunk? Tell me this address. In the last 30 days, all instances of the X type event (for example, a crowdfunding contract has completed its goal) are currently in my account balance. Whether this account exists. If you run the deal in this contract, what is its output?