The Design of Distributed systems involves many protocols and mechanisms to solve reliability and data consistency problems. Quorum is one of them. We will briefly introduce the read/write model in the distributed system.
Read/write models in Distributed Systems
A Distributed System consists of multiple nodes (a server, a storage device, and so on). Due to network exceptions, downtime, and other nodes, it cannot guarantee normal operation, especially when the number of nodes is large, the number of nodes with exceptions is almost certain. To ensure the normal operation of the system and provide reliable services, multiple data copies are used for data storage in the distributed system (Note: The copies here are not only used for backup, it can be used to provide system services) to ensure reliability, that is, if one node fails to read data, it can switch to another node with the same data copy to read and return to the user. This process is transparent to users. As a result, the copy data is inconsistent. For example, after a user submits a change, the original copy is obviously inconsistent with the current data. The easiest way to solve this problem
Read Only write ALL: After the user submits the modification operation, the system ensures that all copies of the stored data are updated, and then tells the user that the operation is successful; when reading data, you only need to query one of the copies and return the data to the user. This solution is good when you seldom modify stored data (for example, archiving historical data for later analysis. In the case of frequent changes, the latency during write operations is obvious. With the addition of concurrency or continuous execution, the efficiency can be imagined. In essence, this is because the load of write and read is not balanced. Read is very easy, and write is under pressure!
Is there a solution that does not need to update full data, but ensures that a valid data solution is returned to the user? The quorum mechanism is an option.
Starting from the principle of drawers in elementary school
Why do I start with the drawer principle? First, everyone is familiar with this, and second, it is similar to the quorum mechanism. Looking back at how the drawer works, two drawers each can accommodate a maximum of two apples, and now there are three apples in whatever way, one of which contains two apples. Then we changed the drawer principle. two drawers put two red apples, and two green apples. Then we took three apples, it is easy to understand that at least one apple is a red apple. We regard the Red Apple as the updated valid data, and the green apple as the unupdated invalid data. As you can see, we don't need to update all the data (not all of which are red apples) to get valid data. Of course we need to read multiple copies (retrieve multiple apples ). This is the prototype of the quorum mechanism. The essence is to write
All load balancing to read only.
Quorum Mechanism
The theory of Apple drawer is just an understanding of it. It references the definition of quorum in references:
In short, quorum is a set. Any set S, R, S, and r in l have an intersection. Of course, this article does not want to explain more about its mathematical definition. Here we only provide a piece of information. If you do not understand it, you can easily understand it by referring to the previous distributed read/write model.
Back to the beginning of the article, let's take a look at how to use the quorum mechanism to solve the read/write load balancing in the read/write model. In fact, the key is how many data replicas can always read valid data during reading? Let's look back at our red apple. Assume there are n data copies in total, K of which have been updated, and n-k are not updated, therefore, when we read n-k + 1 data, at least one of them must belong to the updated K, that is, the intersection of quorum, we only need to compare the data with the highest version in the read n-k + 1 and return it to the user to get the latest data.
What about the write model? After updating K replicas, I can tell the user that the operation is complete, instead of the write all operation, the system will slowly update the remaining copies, which is transparent to users. We can see that part of the load on the write is transferred to the read and read multiple copies, so that the write will not be too tired. What's worse is that the data consistency in the distributed system is weakened. As to how much load is transferred, this should be based on the specific requirements of the distributed system for data consistency. However, the CAP theory tells us that there is no perfect solution.
References:
[1] Dahlia malkhi, Michael Reiter. Byzantine quorum systems [J], 1998
[2] David Peleg, Avishai wool. Crumbling Wals: A class of practical and efficient quorum systems [J], 1997