Merkle tree (Meckerschau) algorithm resolves _

Merkle tree (Meckerschau) algorithm resolves __ block chain

Last Update:2018-08-20 Source: Internet

Author: User

Tags version control system tpm chip

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Merkle Tree Concept

The Merkle tree, also known as the hash trees, is, as the name suggests, the one that stores the hash value. The leaf of the Merkle tree is the hash value of a block of data (for example, a file or a collection of files). A non-leaf node is a hash of its corresponding child node concatenated string. [1]

1, Hash

A hash is a function that maps arbitrary-length data to fixed-length data [2]. For example, for data integrity check, the simplest method is to do hash operation of the entire data to get the fixed length of the hash value, and then the resulting hash value posted online, so that the user downloaded to the data, the data again hash operation, Comparison of the results of the operation and online published hash value comparison, If two hash values are equal, the downloaded data is not corrupted. This can be done because a slight change in the input data will cause the hash result to be unrecognizable, and it is difficult to extrapolate the original input data according to the hash value. [3]

If downloading from a stable server, a single hash is desirable. But if the data source is not stable, once the data is corrupted, it needs to be downloaded again, and the efficiency of this download is very low.

2, Hash List
When data is transmitted in a point-to-point network, it is downloaded from multiple machines at the same time, and many machines can be considered unstable or unreliable. In order to verify the integrity of the data, a better way is to divide the large file into smaller chunks (for example, a block of data divided into 2 K units). The advantage of this is that if the small piece of data is corrupted during transmission, it's just a good way to download the data again without downloading the entire file.

How to determine if the small block of data is not damaged. You only need to make a hash for each block of data. BT download, before downloading to the real data, we will first download a hash list. So here's the question, how do you know if this hash list is the right thing to do? The answer is to put together the hash value of each small piece of data and then hash the long string to get the root hash of the hash list (top hash or root hash). When downloading the data, first get the correct root hash from the trusted data source, you can use it to verify the hash list, then verify the data block through the hash list after checking.

3. Merkle Tree

Merkle tree can be seen as a generalization of the hash list (the hash list can be seen as a special Merkle, a multiple-forked Merkle trees with a height of 2).

At the bottom, as with the hash list, we divide the data into small chunks, with corresponding hashes and corresponding hash numbers. But go up, not directly to the operation of the root hash, but the adjacent two hashes into a string, and then operation of the string hash, so that every two hashes are married and have children, get a "child hash." If the lowest number of hashes is singular, then there must be a single hash, which is hashed directly, so it can also get its child hash. So push up, still the same way, can get a smaller number of new level hash, eventually will form an inverted tree, to the root of this position, this generation is left with a root hash, we call it Merkle root[3].

The Merkle tree root that obtains files from trusted sources before the Peer-to-peer network downloads the network. Once the root is obtained, Merkle tree can be obtained from other sources that have never been trusted. Examine the Merkle tree that is received through the trusty roots. If the Merkle tree is damaged or false, it obtains another merkle from other sources until it obtains a merkle that matches the trusted roots.

The main difference between the Merkle tree and the hash list is that a branch of Merkle tree can be downloaded directly and validated immediately. Because you can cut a file into small chunks of data so that if a piece of data is corrupted, just download the block again. If the file is very large, then Merkle tree and hash list are very good, but Merkle tree can download a branch at a time, and then immediately verify the branch, if the branch validation pass, you can download the data. The hash list can only be verified by downloading the entire hash list.
　　 characteristics of Merkle tree Mt is a kind of tree, most of it is two fork tree, also can fork tree, whether it is a few fork tree, it has all the characteristics of the tree structure; the value of the leaf node of Merkle trees is the cell data of the data set or the hash of the cell data. The value of the non-leaf node is based on the values of all the leaf nodes below it, and then calculated according to the hash algorithm. [4] [5]
　　

Typically, encrypted hash methods like SHA-2 and MD5 are used to make hash. However, if you only prevent data from being deliberately corrupted or tampered with, you can switch to some low security but efficient checksum algorithms, such as CRC.

The roots of the Second preimage attack:merkle tree do not represent the depth of the trees, which could lead to Second-preimage Attack, where an attacker creates a false document with the same Merkle root. A simple workaround is defined in certificate transparency: When calculating the hash of a leaf node, add 0x00 to the hash data. When calculating the internal node is, add 0x01 to the front. Other implementations limit the root of the hash tree by adding a depth prefix to the hash value. Therefore, each step of the prefix is reduced, and only if the prefix remains positive when the leaf is reached, the extracted hash chain is defined as valid. the operation of Merkle tree

1. Create Merckle Tree

There are 9 data blocks at the bottom.

Step1: (red line) hash operation on data block, node0i = hash (data0i), i=1,2,..., 9

Step2: (Orange Line) adjacent two hash blocks are concatenated, then hash operations, Node1 ((i+1)/2) = hash (node0i+node0 (i+1)), i=1,3,5,7, for I=9, Node1 ((i+1)/2) = hash ( node0i)

Step3: (Yellow line) Repeat Step2

Step4: (Green Line) Repeat Step2

STEP5: (Blue Line) Repeat Step2, generate Merkle tree Root

Easily, create Merkle tree is O (n) complexity (here refers to O (n) hash operations), n is the size of the data block. The height of the tree to get Merkle is log (n) +1.

2. Retrieving data blocks

In order to better understand, we assume that there are A and B machines, a need to have 8 files in the same directory as B, the file is F1 F2 F3 ... f8. This time we can make a quick comparison by Merkle tree. Let's say that each machine builds a merkle tree when the file is created. The following figure:

As can be learned from the above figure, the leaf node node7 value = Hash (F1), is the F1 file hash, and its father node Node3 value = hash (V7, V8), which is its child node Node7 node8 of the worthy hash. This is how to represent a hierarchical operational relationship. The value of the root node is actually the only characteristic of the value of all leaf nodes.

If the file 5 on a is not the same as on B. How do we find different files through the Merkle treee information of two machines? This comparison retrieval process is as follows:

Step1. First compare whether V0 is the same, if different, retrieve its children Node1 and Node2.

Step2. V1 are the same, V2 different. Retrieves the Node2 child node5 Node6;

Step3. V5 is different, V6 is the same, retrieves the child node 11 and node 12 of the comparison Node5

Step4. V11 are different, V12 are the same. Node 11 is a leaf node that gets its directory information.

Step5. The retrieval is complete.

The theoretical complexity of the above process is log (N). The process description diagram is as follows:

From the above figure you can see that the whole process can quickly find the corresponding different files.

3, UPDATE, insert and delete

Although there is much information on Merkle tree on the internet, most of them do not involve the update, insertion and deletion of Merkle tree, and discuss the retrieval and traversal of Merkle tree more. I am also very confused, a tree structure of the operation must not only include the search, but also include updates, inserts and deletes AH. Later found a problem on the Stackexchange, just a little bit understand, the original see [6].

The update operation for the Merkle tree block is simple enough to update the data block and then update the hash value on the root path so that it does not change the Merkle tree's structure. However, insertions and deletions will definitely change the structure of the Merkle tree, as in the following illustration, an insert operation is:

After inserting block 0 (consider the location of the data block), the structure of the Merkle tree is this:

The students in [6] are considering an insert algorithm that satisfies the following conditions:
-The number of re-hashing operations is controlled within log (n)
-The checksum of the data block is within log (n) +1
-Unless the N of the original tree is an even number, the tree after inserting the data has no orphans, and if there are orphans, then the orphan is the last data block
-The order of the data blocks remains the same
-Merkle tree stays in balance after insertion

And then the above insert result will be like this:

According to the respondents in [6], the insertion and deletion of Merkle tree is actually an engineering problem, and different problems can be inserted in different ways. If you want to make sure that the tree is balanced or that the tree height is log (n), you can use any of the standard balanced binary tree patterns, such as AVL trees, red and black trees, stretching trees, 2-3 trees, etc. The update mode of these balanced binary trees can complete the insert operation in O (LGN) time and ensure that the tree height is O (LGN). It is easy to see that updating all merkle hashes can be done in O ((LGN) 2) time (for each node to update from it to the root O (LGN) node, and to update O (LGN) nodes to meet the requirements of the tree height. If analyzed carefully, updating all the hashes can actually be done in O (LGN) time, because all nodes to be changed are associated, that is, they are either on a path from one leaf node to the root, or similar.

[6] The respondents say that actually the structure of the Merkle tree (whether it's balanced, the height of the trees) is not important in most applications, and that keeping the order of the data blocks is not needed in most applications. Therefore, you can design your own insert and delete operations according to the specific application situation. A generic Merkle tree insert deletion operation is meaningless. Application of Merkle tree

1. Digital signature

The original Merkle tree was designed to efficiently handle Lamport one-time signatures. Each Lamport key can only be used to sign a message, but the combination of Merkle tree allows you to sign multiple merkle. This approach has become an efficient digital signature framework, Merkle Signature Scheme.

2. Peer-to-peer Network

In Peer-to-peer networks, Merkle tree is used to ensure that blocks of data received from other nodes are not corrupted and not replaced, or even that other nodes do not deceive or publish false chunks. We are familiar with the BT download is the use of Peer-to-peer technology to allow data transmission between the client, one can speed up the download speed, and secondly reduce the burden of downloading the server. BT is BitTorrent, is a central index of Peer-to-peer file Analysis Communication Protocol [7].

To go to the download you must obtain an index file with an extension of torrent from the central Index Server (that is, what you call a seed), and the torrent file contains information about the file you want to share, including the file name, size, hash information of the file, and a url[8 that points to tracker. The hash information in the torrent file is an encrypted summary of the contents of each file to be downloaded, which can also be run for verification at download time. Large torrent files are a bottleneck for Web servers and cannot be directly included in RSS or gossiped around (propagated by rumor propagation protocol). A related problem is the use of large blocks of data, because in order to keep the torrent files very small, the number of hash is very small, which means that each data block is relatively large. Large data blocks affect the efficiency of transactions between nodes, because only when large chunks of data are downloaded and validated, can transactions be made with other nodes.

To solve the above two problems is to replace the hash List with a simple merkle tree. Design a layer of enough two fork tree, leaf node is the hash of the data block, insufficient leaf node with 0来 instead. The upper node is the hash of its child node in series. hash algorithm and ordinary torrent like the use of SHA1. The data transfer process is similar to the one described in the first section.

3, Trusted Computing

Trusted computing is proposed for trusted computing group to provide endpoint credibility for the computing platform of participating nodes in distributed computing environment. Trusted computing Technology introduces a trusted Platform Module (Trusted PLATFORM,TPM) into the hardware layer of the computing platform, which in fact provides a hardware-based trusted root (root of Trust,rot) for the computing platform. Starting from the trusted root, using the trust chain transfer mechanism, the trusted computing technology can implement the level-by-layer integrity metric for the hardware and software of the local platform, and the measurement results are stored reliably in the Platform configuration register (Platform configuration REGISTER,PCR) of the TPM. The remote computing platform can then measure the results in local PCR by Remote authentication mechanism (attestation), thus verifying the credibility of the local computing platform. Trusted computing technology allows the participating nodes of distributed applications to get rid of the central server and build trust directly through the TPM chip on the user's machine, making it possible to create a secure distributed application with better scalability, higher reliability and more usability [10]. The core mechanism of trusted computing technology is Remote authentication (attestation), and the participating node of distributed application is to establish mutual trust through Remote authentication mechanism to guarantee the security of application.

In [10] a remote authentication mechanism based on Merkle tree is proposed, whose core is the integrity measure Hashi.

First, Ramt in the kernel is no longer a list of integrity measures (ML), but an integrity metric Hashi (Integrity measurement hash tree, referred to as IMHT). IMHT's leaf nodes store data objects that are the integrity hashes of the various programs that are measured on the computing platform to be validated, while their internal nodes are dynamically generated based on the hash values of the Merkle Hashi's build rules.

Second, in order to maintain the integrity of the imht leaf node, Ramt needs to use a section of the TPM to hold the value of the IMHT trusted root hash.

Again, the Ramt integrity verification process is based on the authentication path (authentication path). The authentication path refers to the path from the node to the root hash on the imht from which the leaf is to be validated.

4, IPFs

IPFs (Interplanetary File system) is a complex of many NB Internet technologies, such as DHT (distributed HashTable, distributed hash table), Git version control system, BitTorrent, etc. It creates a Peer-to-peer cluster that allows the exchange of IPFs objects. All the IPFs objects form a cryptographic authentication data structure called the Merkle dag.

The IPFs object is a data structure that contains two domains: data– binary data with a size less than 256kB links– an array of link data structures. IPFs objects are linked to other objects through them

The link data structure contains three domains: Name–link's name Hash–link the Hash size–link linked to the object, linked to the cumulative size of the object, including its links

A Merkle DAG (LINKS,IPFS) is made up of a collection of name and a set of images.

For small files (<256kb), is a IPFs object that has no links.

For large files, it is represented as a collection of file blocks (<256KB). Only objects with the smallest data represent this large file. The name of the links for this object is an empty string.

Directory structure: A directory is a IPFs object that has no data, and its links point to the files and directories it contains.

IPFs can represent the data structure that Git uses, Git commit object. The main feature of Commit object is that he has one or more links named ' Parent0 ' and ' parent1 ' (which point to the previous version), and an object named objects (become tree in Git) Point to the file system structure that references this commit.

5, bitcoin and ethereum[12][13]

The earliest application of Merkle proof was bitcoin, which was described and created by the Ben Cong in 2009. Bitcoin's blockchain uses Merkle proofs to store transactions for each block.
　　

The benefit of this is the concept of "simplified payment verification" (simplified Payment VERIFICATION,SPV) described in Ben Cong: A "Light client" (light Client) can download only the chunk header of the chain, that is, the 80byte block of data in each block, contains only five elements, rather than downloading each transaction and each block: hash value of the upper block size time stamp mining difficulty value workload proof random number (nonce) contains the Merkle of the block transaction Root hash of tree
If the client wants to confirm the status of a transaction, it simply initiates a merkle proof request, which shows that the particular transaction is in one of the Merkle trees, and that the root of the Merkle tree is in a block header in the main chain.

But Bitcoin's light client has its limitations. One limitation is that, although it can prove the inclusion of the transaction, it cannot carry out a proof of the current state (such as the holding of digital assets, name registration, the status of financial contracts, etc.).

Bitcoin how to find out how many coins you currently have. A bit-currency light client can use a protocol that involves querying multiple nodes and believes that at least one node will notify you about any particular transaction expense in your address, which allows you to implement more applications. But for other, more complex applications, these are far from enough. The exact nature of the impact of a transaction can depend on several previous deals, which themselves rely on a more precise transaction, so eventually you can verify every transaction on the chain. To solve this problem, the concept of Ethereum's Merkle tree will go further. Ethereum's Merkle Proof

Each etheric square block header is not comprised of a merkle tree, but a three tree designed for three objects: Transaction transaction receipt receipts (essentially a multiple-block of data showing the impact of each transaction) state

This makes a very advanced light client protocol possible, allowing light clients to easily and verify the following types of query answers: is the transaction included in a particular block? Tell me this address. In the last 30 days, all instances of X-type events have been issued (e.g., a public-funded contract has completed its goal) what is the current balance of my account? Whether this account exists. If you run the deal in this contract, what will the output be?
The first is handled by the trading tree (transaction); The third and fourth species are handled by the state tree, and the second by the receipt tree (receipt). The first four query tasks are fairly straightforward to calculate. The server simply finds the object, gets the Merkle branch, and responds to the light client by branching.

The fifth query task is also handled by the state tree, but it can be calculated in a more complex way. Here, we need to build a Merkle state transition certificate (Merkle states transition proof). In essence, the proof is that "if you run a trade T on the state tree of the root s, the resulting state tree will be the root for s ', log l, Output o ' (" Output "as a concept that exists in the ether square, because each transaction is a function call; it is not necessary in theory).

To infer this proof, the server creates a fake block locally, sets the state to S, and pretends to be a light client when requesting the transaction. In other words, if the process of requesting this transaction requires the client to determine the balance of an account, the light client (modeled by the server) will issue a balance query request. If a light client is required to query for a particular entry in a store that features a contract, the light client makes such a request. This means that the server (by simulating a light client) correctly responds to all of its requests, but the server also tracks all its outgoing data.

The server then merges the data from the above requests and sends the data to the client in a proven way.

The client then takes the same steps, but uses the proof provided by the server as a database. The client accepts this proof if the client steps the same as the server provides.
MPT (Merkle Patricia trees)

As we mentioned earlier, one of the simplest merkle trees is a binary tree in most cases. However, the Merkle tree used by Ethereum is more complex, which we call "Merkel Patricia" (Merkle Patricia tree).

Binary Merkle tree is a very good data structure for the information that validation belongs to the list format (essentially, it is a series of contiguous blocks of data). They are also good for trading trees, because once the tree has been built, it doesn't matter how much time it takes to edit the tree, once it is established, it will always exist and will not change.

However, the situation is more complex for the state tree. The state tree in the etheric square basically contains a key value mapping in which the key is the address, and the value includes the statement of the account, the balance, the random number nounce, the code, and the storage of each account (where the store itself is a tree). For example, the founding status of the modern Test network (the Morden Testnet) is as follows:

However, unlike transaction history, the state tree needs to be updated frequently: account balances and random numbers of accounts nonce often accounts, and more importantly, new accounts are inserted frequently, and stored keys (key) are often inserted and deleted. We need such a data structure that can quickly compute to the root of the tree after an insert, update, and delete operation, without having to recalculate the whole hash of the trees. This data structure also includes two very good second features: The depth of the tree is limited, even if the attacker intentionally makes some transactions, making the tree as deep as possible. Otherwise, an attacker could perform a denial-of-service attack (DOS attack) by manipulating the depth of the tree, making the update extremely slow. The root of the tree depends only on the data, regardless of the update order. Updating in a different order, or even computing the tree from the beginning, does not change the root.
MPT is the closest data structure that satisfies the nature of the above. The simplest explanation for how MPT works is that values are stored by keys, and the keys are encoded into paths that the search tree must pass through. Each node has 16 children, so the path 16 encoding determines: for example, the 16 encoding of the key ' dog ' is 6 4 6 15 6 7, so start from root to branch sixth, then to fourth, then to sixth, then to 15th, so that the leaves of the tree are reached sequentially.

In practice, there are some additional optimizations when trees are scarce, and we make the process more efficient, but this is the basic principle.

6. Other applications

There are many applications for Merkle tree, such as Git,amazon dynamo,apache Wave protocol,tahoe-lafs backup system,certificate Transparency Framework,nosql systems like Apache Cassadra and Riak

Original: https://blog.ethereum.org/2015/11/15/merkling-in-ethereum/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More