Ethereum Source Analysis (52) Ethereum Fast Sync algorithm

Source: Internet
Author: User
Tags benchmark pow

This PR aggregates a lot of small modifications to core, Trie, ETH and other packages to collectively implement the ETH/63 Fast synchronization algorithm. In short, Geth--fast.

This submission request contains minor modifications to Core,trie,eth and some other package implementations to implement the ETH/63 fast synchronization algorithm together. In simple terms, Geth--fast.

# # Algorithm algorithm

The goal of the Fast Sync algorithm is to exchange processing power for bandwidth usage. Instead of processing the entire Block-chain one link at a time, and replay all transactions this ever happened in history , fast syncing downloads the transaction receipts along the blocks, and pulls an entire recent state database. This allows a fast synced node to still retain it status an a archive node containing all historical data for user Queri ES (and thus not influence the network's health in general), but at the same time-to-reassemble a recent network state at A fraction of the time it would take full block processing.

The goal of the fast synchronization algorithm is to swap the bandwidth for computation. Fast synchronization does not process the entire blockchain through a single link, but rather replays all transactions that occurred historically, and fast synchronization downloads the transaction documents along those blocks and pulls the entire recent state database. This allows fast-synchronizing nodes to still maintain the state of their archive nodes that contain all the historical data for user queries (and therefore does not generally affect the health of the network), and the full amount of chunk processing is used for the latest chunk state changes.

An outline of the Fast Sync algorithm would be:

-Similarly to classical sync, download the block headers and bodies it make up the blockchain-similarly to classical s Ync, verify the header chain ' s consistency (POW, total difficulty, etc)-Instead of processing the blocks, download the TR Ansaction receipts as defined by the header-store of the downloaded blockchain, along with the receipt chain, enabling all H Istorical Queries-when The chain reaches a recent enough state (head-1024 blocks), pause for State sync:-Retrieve The entire Merkel Patricia state trie defined by the root hash of the pivot point-for every account found in the Trie , retrieve it ' s contract code and internal storage state trie-upon successful trie download, mark the pivot point (head- 1024x768 blocks) as the Head-import all remaining blocks (1024x768) by fully processing them as in the classical sync

Overview of the Fast synchronization algorithm:

-Similar to the original synchronization, download the chunk header and chunk body-that make up the blockchain are similar to the original synchronization, verify the consistency of the chunk header (POW, total difficulty, etc.)-Download the transaction receipt defined by the block size, not the block. -Store the downloaded blockchain and receipt chain, enable all historical queries-when the chain reaches its nearest state (head-1024 blocks), pauses the state synchronization:-Gets the complete Merkel Patricia trie state of the chunk defined by pivot point-for Merke L Patricia trie inside each account, get his contract code and intermediate storage trie-when Merkel Patricia trie downloaded successfully, the block defined by Pivot point is used as the current chunk header-it is fully processed like the original synchronization. Import all remaining blocks (1024)

# # Analysis Analysisby downloading and verifying the entire header chain, we can guarantee with all the security of the Classica L Sync, that's the hashes (receipts, State tries, etc) contained within the www.yigouyule2.cn headers is valid. Based on those hashes, we www.255055.cn/can confidently download transaction receipts and the entire state trie afterward S. Additionally, by placing the pivoting point (where fast sync switches to block processing) a bit below the current head (1024x768 blocks), we can ensure that even larger chain reorganizations www.taohuayuangw.com can www.120xh.cn be handled wit Hout the need www.qinlinyule.cn of a new www.uuweb.cn sync www.yigouyule2.cn (as we have all www.thd178.com the state go ing that many blocks back).

By downloading and verifying the entire head chain, we can guarantee all the security of the traditional synchronization, and the hashes contained in the header (receipts, status attempts, etc.) are valid. Based on these hashes, we can confidently download the transaction receipts and the entire status tree. In addition, by placing pivoting point (fast sync to chunk processing) below the current chunk header (1024 blocks), we can ensure that even larger blockchain reorganizations are handled without the need for new synchronizations (because we have all state TODO).

# # Note Caveatsthe Historical block-processing based synchronization mechanism has (approximately similarly costing) b Ottlenecks:transaction processing and PoW verification. The baseline fast sync algorithm successfully circumvents the transaction processing, skipping the need to iterate over EV Ery single State the system ever is in. However, verifying the proof of work associated with each header is still a notably CPU intensive operation.

The synchronization mechanism based on the history block processing has two (approximate similar cost) bottlenecks: Transaction processing and POW verification. The baseline fast synchronization algorithm successfully bypasses transaction processing and skips the need to iterate over every state that the system was in. However, verifying that the work that is associated with each header is still CPU intensive.

However, we can notice an interesting phenomenon during header verification. With a negligible probability of error, we can still guarantee the validity of the chain, only by verifying every k-th hea Der, instead of each and every one. By selecting a for the random out of every K headers to verify, we guarantee the validity of a n-length chain WI th the probability of (1/k) ^ (n/k) (i.e. we has 1/k chance to spot a forgery on K blocks, a verification that ' s repeated N /k times).

However, we can notice an interesting phenomenon during District block validation because the error probability can be negligible, we can still guarantee the validity of the chain, only need to verify each K-head, not each header. By randomly selecting a header from each K-header to verify, we guarantee that the probability of the N-length chain will be forged (1/k) ^ (n/k) (in the K block we have 1/k chance to find a forgery, and the validation is done n/k times. )。

Let's define the negligible probability Pn as the probability of obtaining a, SHA3 collision (i.e. the hash ethereu M is built upon): 1/2^128. To honor the Ethereum security requirements, we need to choose the minimum chain length N (below which we veriy every head ER) and Maximum K verification batch size such as (1/k) ^ (n/k) <= Pn holds. Calculating this for various {N, K} pairs are pretty straighforward, a simple and lenient solution being Http://play.golang . org/p/b-8sx_6dq0.

We define the probability that the negligible probability PN is to obtain a 256-bit SHA3 collision (the Ethereum hash algorithm): 1/2 ^ 128. In order to comply with the Ethereum security requirements, we need to select the minimum chain length N (before each block is verified), the maximum K verification batch size such as (1/k) ^ (n/k) <= Pn. It is very straightforward to compute the various {n,k} pairs, and a simple and relaxed solution is http://play.golang.org/p/B-8sX_6Dq0.

| n   | k      | n   | k       | n   | k       | n   | K | | ------|-------|-------|-----------|-------|-----------|-------|---|| 1024   |43     |1792   |91       |2560   |143        |3328   |198| | 1152   |51     |1920   |99       |2688   |152        |3456   |207| | 1280   |58     |2048   |108         |2816   |161        |3584    |217| | 1408   |66     |2176   |116        |2944   |170         |3712   |226| | 1536   |74     |2304   |128         |3072   |179        |3840    |236| | 1664   |82     |2432   |134         |3200   |189        |3968    |246|



The above table should is interpreted in such a-to, that if we verify every k-th headers, after N headers the probability Of a forgery is smaller than the probability of a attacker producing a SHA3 collision. It also means, that if a forgery was indeed detected, the last N headers should was discarded as not safe enough. Any {N, K} pair is chosen from the above table, and to keep the numbers reasonably looking, we chose n=2048, k=100. This'll be fine tuned later after being able to observe network bandwidth/latency effects and possibly behavior on more CPU Limited devices.

The above table should explain this: if we verify the chunk header every K-zone, the probability of forgery is less than the probability of the attacker producing SHA3 collisions after the block size of n blocks. This also means that if you do find a forgery, then the last n head should be discarded because it is not safe enough. You can select any {n,k} pair from the table above, in order to choose a number that looks good, we choose N = 2048,k = 100. Subsequent adjustments may be made based on network bandwidth/latency impact and on devices that may be running on a device with limited CPU performance.

Using This caveat however would mean, then the pivot point can is considered secure only after N headers has been Importe D after the pivot itself. To prove the pivot safe faster, we stop the ' gapped Verificatios ' X headers before the pivot point, and verify every Singl E header onward, including an additioanl X headers Post-pivot before accepting the pivot's state. Given the above N and K numbers, we chose x=24 as a safe number.

However, using this feature means that it is considered safe to import only n chunks before the pivot node is imported. To prove the safety of pivot more quickly, we stop the block verification behavior at the distance from the pivot node x and validate each subsequent block until pivot. Given the N and K numbers above, we chose X = 24 as the safe number.

With this caveat calculated, the fast sync should is modified so it up to the pivoting point-x, only every k=100-th he Ader should is verified (at random), after which all headers up to pivot point + X should be fully verified before Startin G State database downloading. Note:if a sync fails due to header verification the last N headers must is discarded as they cannot be trusted enough.

By calculating the caveat, the fast synchronization needs to be modified to pivoting point-x, each of the 100 block heads randomly pick one of them to verify, after each block needs to be fully verified after the state database download, if the failure due to block header verification failed synchronization, Then the last n chunk headers need to be discarded, and they should not reach the standard of trust.



# # Disadvantage Weaknessblockchain protocols in general (i.e. Bitcoin, Ethereum, and the others) is susceptible to Sybil attacks, W Here's attacker tries to completely isolate a node from the rest of the network, making it believe a false truth as to WH At the state of the real network is. This permits the attacker to spend certain funds in both the real network and this "fake bubble". However, the attacker can only be maintain this state as long as it's feeding new valid blocks it itself is forging; And to successfully shadow the real network, it needs to does this with a chain height and difficulty close to the real NETW Ork. In short, to pulled off a successful Sybil attack, the attacker needs to match the network's hash rate, so it's a very expen Sive attack.

Common blockchain (such as Bitcoin, Ethereum, and others) is more susceptible to witch attacks, and an attacker tries to isolate an attacker from the main network completely, allowing the attacker to receive a false state. This allows the attacker to spend the same amount of money on the real network while this bogus network is being spent. However, this requires the attacker to provide real self-forged blocks, and the need to successfully influence the real network, it is necessary to block the height and difficulty of approaching the real network. Simply put, in order to successfully implement a witch attack, an attacker would need to approach the hash rate of the main network, so it is a very expensive attack.

Compared to the classical Sybil attack, fast sync provides such an attacker with an extra ability, that's feeding a node A view of the network that's not only different from the real network, but also that might go around the EVM mechanics. The Ethereum protocol only validates state root hashes by processing all the transactions against the previous state root. But by skipping the transaction processing, we cannot prove this state root contained within the Fast sync pivot poin T is valid or not, so as long as an attacker can maintain a fake blockchain that's on par with the real network, it could Create an invalid view of the network ' s state.

Compared to traditional witch attacks, fast synchronization provides an additional capability for the attacker to provide a network view that is not only different from the real network, but may also bypass the EVM mechanism. The Ethereum protocol verifies the state root hash only by processing all transactions with the previous state root. However, by skipping transaction processing, we cannot prove that the state root contained in the fast synchronization pivot point is valid, so as long as the attacker is able to maintain the same fake blockchain as the real network, an invalid view of the network status can be created.

To avoid opening up nodes to this extra attacker ability, fast sync (beside being solely opt-in) would only ever run during An initial sync (i.e. the node's own blockchain is empty). After a node managed to successfully sync with the network, Fast sync is forever disabled. This-to-anybody can quickly catch up with the network, but after the node caught up, the extra attack vector is plugged I N. This feature permits users to safely use the Fast Sync flag (--fast), without have to worry on potential state roo T attacks happening to them on the future. As an additional safety feature, if a fast sync fails close to or after the random pivot point, fast sync is disabled as a Safety precaution and the node reverts to full, block-processing based synchronization.

To avoid opening the node to this additional attacker capability, fast synchronization (specifically specified) will only run during the initial synchronization (the local blockchain for the node is empty). Fast synchronization is always disabled after a node has successfully synchronized with the network. This allows anyone to quickly catch up with the network, but after the node catches up, the additional attack vectors are inserted. This feature allows the user to safely use the Fast Sync flag (--fast) without worrying about the potential state of the root attack that will occur in the future. As an additional security feature, if fast synchronization fails near or after a random pivot point, fast synchronization is disabled as a security precaution, and the node reverts to a full synchronization based on block processing.

# # Performance Performanceto Benchmark the performance of the new algorithm, four separate tests were run:full syncing from Scrath On Frontier and Olympic, using both the classical sync as well as the new sync mechanism. In all scenarios there were-nodes running on a single machine:a seed node featuring a fully synced database, and a Le Ech node with only the Genesis block pulling the data. In all test scenarios the seed node had a fast-synced database (smaller, less disk contention) and both nodes were given 1 GB Database Cache (--cache=1024).

To benchmark the performance of the new algorithm, four separate tests were run: Using Classic synchronization and a new synchronization mechanism, the Scrath is fully synchronized from the frontier and Olympic. In all cases, run two nodes on a single machine: a seed node with a fully synchronized database, and a leech node with only the starting block pulling the data. In all test scenarios, seed nodes have a fast-synchronizing database (smaller, less disk contention), and two nodes have a 1GB database cache (--cache = 1024).

The machine running the tests is a Zenbook Pro, Core i7 4720HQ, 12GB RAM, 256GB m.2 SSD, Ubuntu 15.04.

The machine running the test is Zenbook Pro,core i7 4720hq,12gb ram,256gb m.2 ssd,ubuntu 15.04.

| Dataset (blocks, states) | Normal Sync (time, db) | Fast Sync (time, db) | | ------------------------- |:-------------------------:| ---------------------------:|| Frontier, 357677 blocks, 42.4K states | 12:21 mins, 1.6 GB | 2:49 mins, 235.2 MB | | Olympic, 837869 blocks, 10.2M states | 4:07:55 hours, GB | 31:32 mins, 3.8 GB |



The resulting databases contain the entire blockchain (all blocks, all uncles, all transactions), every transaction Receip T and generated logs, and the entire state trie of the head 1024x768 blocks. This allows a fast synced node to act as a full archive node from all intents and purposes.

The resulting database contains the entire blockchain (all chunks, all chunks, all trades), each transaction receipt and generated log, as well as the entire state tree of the first 1024 blocks. This allows a fast synchronization node to act as a full archive node for all intents and purposes.



# # Conclusion Closing RemarksThe Fast Sync algorithm requires the functionality defined by ETH/63. Because of this, testing in the live network requires on least a handful of discoverable peers to update their nodes T o eth/63. On the same note, verifying this implementation is truly correct would also entail waiting for the wider deployment of ETH/63.

The fast synchronization algorithm requires functionality defined by ETH/63. Because of this, testing in the current network requires at least a few discoverable peer nodes to update their nodes to ETH/63.

Ethereum Source Analysis (52) Ethereum Fast Sync algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.