Zookeeper 3, Zookeeper working principle (detailed)

Source: Internet
Author: User

1, the role of zookeeper

» Leaders (Leader), responsible for voting initiation and resolution, update system Status
» Learners (learner), including followers (follower) and observers (Observer), follower used to accept client requests and want the client to return the results, participate in the voting during the main selection process
»observer can accept client connections, send write requests to leader, but observer not participate in the voting process, only synchronize the leader state, observer the purpose is to expand the system, improve read speed
» Clients (client), requesting initiator

    

     

The core of zookeeper is atomic broadcasting, a mechanism that guarantees synchronization between the various servers. The protocol that implements this mechanism is called the Zab Association.
On. The ZAB protocol has two modes, namely the recovery mode (select Master) and broadcast mode (synchronous). When the service is started or in the leader
After the crash, Zab into recovery mode, when the leader is elected and most of the servers have completed and leader State synchronization
, the recovery mode is over. State synchronization ensures that the leader and server have the same system state.

• To ensure transactional order consistency, zookeeper uses an incremental transaction ID number (ZXID) to identify transactions. All the proposals (
Proposal) were added to the ZXID when they were presented. The implementation in ZXID is a 64-bit number, and its high 32 bits are the epoch used to identify
Leader relationship changes, each time a leader is chosen, it will have a new epoch, identifying the current belonging to the leader
Reign period. The low 32 bits are used to increment the count.
• Each server has three states in the process of working:
Looking: Current server does not know who leader is, is searching
Leading: The current server is an elected leader
Following:leader has been elected and the current server is in sync with it.

Other documents: http://www.cnblogs.com/lpshou/archive/2013/06/14/3136738.html

2. Zookeeper reading and writing mechanism

»zookeeper is a cluster consisting of multiple servers
» One leader, multiple follower
» Save one copy of data per server
» Global Data Consistency
» Distributed Read/write
» Update request Forwarding, implemented by leader

3, the Zookeeper guarantee

» Update request sequence, update requests from the same client are executed in the order in which they are sent
» Data Updates atomicity, one data update either succeeds or fails
» Global Unique Data View, the data view is consistent regardless of the server to which the client is connected
» Real-time, within a certain event range, the client can read the latest data

4. Zookeeper node Data operation flow

       

Note: 1. Send a write request to Follwer in the client

2.Follwer send the request to leader

3.Leader received after the start of voting and notify Follwer to vote

4.Follwer Send the poll results to leader

5.Leader after the results are summarized, if a write is required, the write operation is also notified to leader, then commit;

6.Follwer returns the request result to the client

      

follower has four main functions:
1. Send a request to leader (Ping message, request message, ACK message, revalidate message);
• 2. Receive leader messages and process them;
• 3. Receive client's request, if write request, send to leader to vote;
• 4. Returns the client result.
The follower message loop handles the following messages from leader:
• 1. Ping message: Heartbeat message;
• 2. Proposal NEWS: Leader initiated the proposal, request follower vote;
• 3. Commit message: Information about the latest proposal on the server side;
• 4. UpToDate message: Indicates that synchronization is complete;
• 5. Revalidate message: According to Leader's revalidate results, close the session to be revalidate or allow it to accept messages;
• 6. Sync message: Returns the sync result to the client, originally initiated by the client, to force the latest update.

5. Zookeeper leader election 

• Half Pass
–3 Machine hangs a 2>3/2
–4 Machine Hang 2 units 2! >4/2

  

a The proposal that I should choose myself, B do you agree? C Do you agree? B said, I agree to choose A;c said, I agree to choose a. (Note that this is more than half, in fact, in the real world elections have been successful.)

But the computer world is very strict, in addition to understand the algorithm, to continue to simulate. )
• Then the B proposal says, "I'm going to choose myself, a Do you agree?" A said that I had already half agreed to be elected and that your proposal was invalid; C said that a was over half agreed to be elected and B was invalid.
• Then the C proposal says, I want to choose myself, a do you agree; A said that I had already half agreed to be elected and that your proposal was invalid; B said that a was over half agreed to be elected and C's proposal was invalid.
• The election has produced leader, followed by follower, only to obey leader's orders. And here is a small detail, is actually who first start who head.

  

  

6, Zxid

The status information for the znode node contains Czxid, so what is ZXID?
Each change in the zookeeper state corresponds to an incremented transaction ID, which is called Zxid. Due to the increasing nature of ZXID, if ZXID1 is less than Zxid2, then zxid1 must precede zxid2.

Creating any node, or updating any node's data, or deleting any node, will cause the zookeeper state to change, resulting in an increase in the value of ZXID.

7. Zookeeper Working principle

The core of»zookeeper is atomic broadcasting, a mechanism that guarantees synchronization between the various servers. The protocol that implements this mechanism is called the Zab protocol. The ZAB protocol has two modes, the recovery mode and the broadcast mode, respectively.

When the service is started or after the leader crashes, the Zab enters the recovery mode, and when the leader is elected and most of the servers are in sync with the leader state, the recovery mode is finished.

State synchronization ensures that leader and server have the same system state

» Once the leader has been synchronized with most of the follower, he can start broadcasting the message, that is, into the broadcast state. When a server joins the Zookeeper service, it starts in recovery mode,

Discover leader, and synchronize state with leader. It also participates in message broadcasts until the synchronization is complete. The zookeeper service remained in broadcast state until leader collapsed or leader lost most of

The followers support.

» Broadcast mode requires that proposal be processed sequentially, so ZK is guaranteed by an incremental transaction ID number (ZXID). All the proposals (proposal) were added to the ZXID when they were presented.

The implementation of ZXID is a 64-digit number, and its high 32-bit is the epoch used to identify whether the leader relationship has changed, and each time a leader is chosen, it will have a new epoch. A low 32-bit is an incrementing count.

» When leader crashes or leader loses most of the follower, when ZK enters recovery mode, the recovery mode needs to re-elect a new leader, so that all servers are restored to a correct state.

» After each server startup, ask the other server who it is going to vote for.
» for queries from other servers, the server responds to its own recommended leader ID and the ZXID of the last processing transaction (each server will recommend itself at system startup) each time according to its own status
» After receiving all the server replies, calculate which server Zxid is the largest, and set the server-related information to the next server to vote.
» The sever of the most votes in this process is the winner, and if the winner has more than half the number of votes, the server is selected as leader. Otherwise, continue the process until leader is elected.

»leader will start waiting for the server connection
»follower connect leader, send the largest zxid to leader
»leader to determine the synchronization point based on the zxid of follower
» Notification follower has become UpToDate status after completion of synchronization
»follower receive the uptodate message, you can re-accept the client's request for service

8. Data consistency and Paxos algorithm  

• It is said that the difficulty of understanding the Paxos algorithm is as admirable as the popularity of the algorithm, so we first look at how to maintain the consistency of the data, here is a principle:
• In a distributed database system, if the initial state of each node is consistent, each node executes the same sequence of operations, then they can finally get a consistent state.
What problems the Paxos algorithm solves is to ensure that each node performs the same sequence of operations. Well, it's not easy, Master maintains a
Global Write queue, all write operations must be placed in this queue number, then no matter how many nodes we write, as long as the write operation is numbered, it can guarantee a
Sexual. Yes, that's it, but if master hangs up.
The Paxos algorithm uses a poll to globally number writes, and at the same time, only one write is approved, and concurrent writes are going to win votes,
Only a majority of the votes of the write operation will be approved (so there will always be only one write operation is approved), other write operation competition failed to start a
Polling, and so on, every day and year after year of voting, all the writing operations are strictly numbered sorted. Numbers are strictly incremented when a node accepts a
Write with a number of 100, followed by a write with a number of 99 (due to many unforeseen causes such as network latency), it immediately realizes its own data
Inconsistent, automatically stops the external service and restarts the synchronization process. Any one node hangs up without affecting the entire cluster's data consistency (the total 2n+1 table, unless it hangs larger than N).
Summarize
As a subproject in the Hadoop project, Zookeeper is an essential module for Hadoop cluster Management, which is primarily used to control the data in the cluster,

As it manages the NameNode in the Hadoop cluster, there is the Master election in Hbase, state synchronization between servers, and so on.
9, Observer 

zookeeper need to ensure high availability and strong consistency;
• To support more clients, more servers need to be added;
server increase, the voting stage delay increased, affecting performance;
• Tradeoff between scalability and high throughput, introducing observer
Observer not participate in voting;
observers accepts the client connection and forwards the write request to the leader node;
• Add more observer nodes to increase scalability without compromising throughput rates

10, why the number of zookeeper clusters, generally for odd number of?

leader election algorithm adopts the Paxos protocol;
Paxos Core idea: When most servers write successfully, the task data is written successfully if there are 3 servers, two writes succeed, and if there are 4 or 5 servers, three writes succeed.
The number of server is generally odd (3, 5, 7) If there are 3 servers, it allows up to 1 servers to hang up, and if there are 4 servers, it allows up to 1 servers to hang up.

We see that 3 servers and 4 servers of the disaster tolerance is the same, so in order to save server resources, we generally use an odd number, as the number of server deployment.

Zookeeper 3, Zookeeper working principle (detailed)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.