Distributed design and development (II)-Several distributed algorithms that must be understood

Source: Internet
Author: User
Some difficult problems in distributed design and development must be solved by using some algorithms. For example, the consistency problem in the distributed environment, I feel that the following distributed algorithms must be understood (to be added as learning goes deeper ):
  • Paxos Algorithm
  • Consistent Hash Algorithm

Paxos Algorithm
1) Problem Description
In distributed mode, the client sends a series of data update messages to the server of a distributed cluster. Because each server node in the distributed cluster synchronizes data to each other, therefore, after running the client-side message commands, the data on each server node should be consistent. However, due to network or other reasons, the sequence of messages received by each server node may be different, finally, the data of each node is inconsistent. The following figure shows the structure of the client and server:
J3.jpeg(40.97 KB)

When Client1, Client2, and client3 send message commands A, B, and C respectively, server1 ~ 4. Due to network problems, the received message sequences may be different, which may lead to server1 ~ 4. For such a problem, it is difficult to process synchronization in a distributed environment as simply as a single machine, and the paxos algorithm is a solution similar to the above data inconsistency problem.
2) algorithm itself
I do not fully describe and deduce the algorithm itself. I have done this with a lot of information on the Internet, but after learning it, I feel that Leslie Lamport, the founder of paxos algorithm, this person is now at Microsoft Research Institute). paxos made simple is the best document for learning paxos. It does not scare people with a bunch of formulas and mathematical symbols like most algorithm documents, instead, you can use the human language to figure out what problems paxos should solve and how to solve them. Here, we also take the opportunity to attack those academic researchers. To let others recognize your results, we must first learn how to make most people happy to read your results, this document describing the paxos algorithm is an example for us to learn.
To put it bluntly, through the various steps and constraints of the paxos algorithm, it is actually a distributed Election Algorithm, whose purpose is to pass the election in a pile of messages, this allows the message receiver or performer to reach an agreement and execute the message in the same order. In fact, in the simplest way, commands in the same sequence can be executed in serial mode. For example, you can add a FIFO queue before the distributed environment to receive all the commands, then all service nodes are executed in the queue order. This method can certainly solve the consistency problem, but it does not conform to the distributed features. If this queue is down or overwhelmed, how can this problem be solved? The advantage of paxos is that it allows various clients to send commands to the server without affecting each other, and everyone agrees on an election. This method has the distributed feature and has better fault tolerance.
Speaking of the Election Algorithm itself, we can think of the election in the real society. Generally, the most votes are the winners, while the paxos algorithm is the winner with higher serial numbers, and when the submitter is rejected (indicating that the serial number occupied by the submitter is not the highest), it will participate in the re-election with a better sequence, by constantly participating in the election by each submitter, we can select a sequence recognized by everyone. Because of this continuous election process, paxos defines three roles (proposer, acceptor, and learner) and two stages (accept and learn ), for the specific responsibilities of the three roles and the specific process of the two stages, see paxos made simple. Another Chinese buddy wrote a good ppt and described the running process of paxos through animations. However, do not get into the details of algorithms from the very beginning. You must think about the original intention of designing these game rules.
The biggest advantage of the paxos algorithm is that it has few restrictions. It allows various roles to fail and execute repeatedly at various stages, which is also common in a distributed environment, as long as everyone works according to the rules, the algorithm itself ensures that the results are consistent when errors occur.
3) Algorithm Implementation
There are many paxos implementations, the most famous of which is Google chubby, but you cannot see the source code. The open-source implementation shows libpaxos. In addition, Zookeeper also solves Data Consistency issues based on paxos. You can also see that it is intended to implement paxos.
4) applicable scenarios
After figuring out the ins and outs of paxos, we will find that there are many applicable scenarios. Tim has a blog titled paxos common application scenarios in large systems. In my project, naming service is the most widely used paxos field. For more information, see zookeeper.
Consistent Hash Algorithm
1) Problem Description
The hash algorithm is often used for Distributed Data distribution. It is very good when the data nodes do not change, but when the data nodes increase or decrease, due to the need to adjust the model in the hash algorithm, as a result, all data must be distributed to each node according to the new model. If the data volume is large, such work is often difficult to complete. Consistent hash algorithms are optimized based on the hash algorithm. These problems are solved through some ing rules.
2) algorithm itself
I will not fully elaborate on the consistent hash algorithm itself. A blog titled consistent hash algorithm-consistent hashing describes this algorithm very well and I will not duplicate it.
In fact, we can also draw on consistent hash ideas in other design and development fields. When a ing or rule causes problems that are difficult to maintain, we can consider further abstracting these mappings or rules, changes in rules will not change the final data. Consistent hash is actually to change the previous point ing to field ing, so that other data nodes change as little as possible after the data node changes. The operating system has many storage problems. For example, in order to make better use of the storage space, the operating system distinguishes different latitudes such as segments and pages, and adds many ing rules, the goal is to avoid the cost of physical changes through flexible rules.
3) Algorithm Implementation
The consistent hash algorithm is relatively simple, but there are many improved versions based on the actual situation. The objective is nothing more than two points:

  • When the node changes, other nodes are affected as little as possible.
  • Data redistribution is as balanced as possible after node changes

The technology itself does not have much difficulty and workload to implement this algorithm. What you need to do is to establish the ing relationships you have designed, without any frameworks or tools, there is a project libconhash on SourceForge. For more information, see
In my opinion, even developers who never involve algorithms need to understand the above two algorithms. algorithms are actually a strategy, in a distributed environment, we often need to design a strategy to solve many difficult problems that cannot be solved simply by technology. Learning these algorithms can provide us with some ideas.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.