Practical Byzantine fault tolerance

Source: Internet
Author: User

From Practical Byzantine fault tolerance

This article aims to implement Byzantine faults fault tolerance. This article opens the door and proposes the advantages of a new algorithm: it can work in an asynchronous environment (such as the Internet), and the response time can be improved by more than an order of magnitude than the previous algorithm. Of course there will certainly be limitation, and we will try to find them.

The first article tells us that a problem has not been solved: Fault-Tolerant privacy.


? Normal-case operation

Buffered requests is proposed to reduce the message traffic and CPU overheads when the system load is heavy. However, this does not seem to be the focus of this article, so it is ignored.

The model adopts the client-> Primary-> backups process, that is, the client first sends the request to primary, and then the primary broadcasts the request to the backups through a three-stage protocol. Let's take a look at the three phases: Pre-prepare, prepare, and commit.

In the pre-prepare phase, primary assigns a sequence number n to the request, sends a prepare message with M piggybacked to all backups, and adds the message to its log. The message format is


V indicates the view where the message is sent, M indicates the request message of the client, and D indicates the summary of M.

However, the request is not included in the pre-prepare information to avoid excessive information. So what is the purpose of primary to send this message? If backup I receives this information, it enters the prepare stage. It will send the following information for all replicas,


This information is recorded in your own log. If you do not receive the information, you will not do anything. A replica (including primary) receives these prepare messages. If their signatures are correct, their view number is equal to the current view of the replica, and their serial numbers are between H and H, add the information to its log. This is the commit stage. This replica will send information to other replicas:

When the information is reasonable, replicas inserts them into its own log.

The pre-prepare and prepare phases aim to ensure that there is no error in replicas in a view to reach an agreement on a full-order request sequence. After the commit stage, make sure that each non-Faulty replicas executes the request in the same order and sends a reply to the client. The process is as follows:



? Non-determinism

If the modification time of the file is determined by the local clock on each machine, there will be differences and non-determinism, therefore, we need a mechanism for all replicas to select the same value as the modification time, and we cannot allow the client to select this value in advance, because it does not know how the order of the request and other clients is arranged. In the end, this case is handed over to primary, which selects the non-determinism value, and then makes the non-Faulty replicas agree on the three-phase protocol.


? Communication communication

We all know that all replicas will return their respective execution results to the client. If there are a lot of replies, it will inevitably bring a large amount of network bandwidth consumption and CPU overhead, the optimization here is to let a replica return the complete result, while other replicas return a Summary of the result, but this summary can be used to verify whether the result is correct or not.


? Cryptography

One improvement of this article is to use traditional digital signature when sending view-change and new-view messages that are not frequently sent, and use MACs for other messages that are frequently sent, this eliminates major performance bottlenecks.


? Implementation

Snfsd directly performs file system operations in memory mapped file, so that locality is maintained, and it uses the copy-on-write method to reduce the overhead space and time related to the maintenance of checkpoints.


? Related work

In the past, most of the work on replication technology ignored Byzantine faults and assumed a synchronization system model. Therefore, the main work of this paper is for these two purposes. Traditional viewstamped replication and the well-known paxos can only tolerate benign errors in asynchronous systems, and their support for fault tolerance is not complete. In fact, for Fault Tolerance of Byzantine faults, a complicated protocol using password authentication technology is required. There must be a pre-prepare stage and view-change to select primary. Perhaps it is a advantage of this Article: select a new primary through view changes, instead of selecting a different replicas set to form a new view.

There have been some consistency protocols that can tolerate Byzantine faults, but in asynchronous systems, they do not provide a complete solution for State Machine Replication and cannot be immediately used for practice, the algorithm in this article not only implements Byzantine fault tolerance under normal circumstances, but also considers the case of primary errors.

In addition, through comparison with rampart and securering models, the speed of the model in this article is faster than an order of magnitude. In asynchronous systems, the failure detectors technology used to check which replica error is not accurate, so they will be misjudged in asynchronous systems, and the model in this article can be done. More importantly, these two models will exclude the faulty replicas from the group, and will also remove the non-Faulty replicas from the group due to misjudgment. The model proposed in this article does not exclude replicas from the group, so you don't have to worry about this issue.

Phalanx is also a Byzantine fault tolerance model that can be used in asynchronous environments. However, this model is faster than it, this is because the model in this article has little information delay on the Key Path due to the use of Macs instead of public key cryptography.


? Conclusions

As a reader, I feel that the models proposed in this article include:

(I) fault tolerance for Byzantine faults

(Ii) It is the first model that can work correctly in an asynchronous environment (such as the Internet) and has improved the performance by more than an order of magnitude than the previous algorithms.

(Iii) the model is used in NFS, BFs is implemented, and some column optimization measures are taken: replace public-key signatures with Macs to reduce the quantity and size of information, there is also an incremental checkpoint-Management Technology

(Iiii) when software errors occurs, the system using this algorithm still works normally. Of course, if all replicas have this software error, it is powerless, however, for errors that occur independently in different replicas, including Nondeterministic software errors, which are difficult to detect, the algorithm in this article can still be mask.


Let's talk about the limitations of the algorithm in this article:

Reduce the number of resources required for implementing algorithms, such as reducing the number of replicas and the number of copies of the State.






Practical Byzantine fault tolerance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.