function and working principle of zookeeper

Last Update:2016-09-13 Source: Internet

Author: User

Tags ack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is 1.ZooKeeper?
Zookeeper is a distributed, open source distributed Application Coordination Service, is Google's chubby an open source implementation, it is the manager of the cluster, monitoring the status of each node in the cluster according to the feedback submitted by the node for the next reasonable operation. Finally, the user is provided with an easy-to-use interface and a performance-efficient, robust system

What does the 2.ZooKeeper offer?

1) file system

2) Notification mechanism

3.Zookeeper File System

Each subdirectory entry, such as Nameservice, is called Znode, and as with the file system, we are free to add, remove Znode, add and remove sub-znode under a znode, except that Znode can store data.

There are four types of Znode:

1. persistent-Persistent Directory node

The node still exists after the client disconnects from the zookeeper

2. persistent_sequential-persistent sequential numbered directory node

After the client disconnects from the zookeeper, the node still exists, but the node name is zookeeper sequentially numbered

3. ephemeral-Temp directory Node

After the client disconnects from zookeeper, the node is deleted

4. ephemeral_sequential-temporary sequential numbering directory node

After the client disconnects from the zookeeper, the node is deleted, but the node name is zookeeper sequentially numbered

<ignore_js_op>

4.Zookeeper notification mechanism

The client registers to listen to the directory nodes it cares about, and zookeeper notifies the client when the directory node changes (data changes, deletions, and the subdirectory nodes are deleted).

What did 5.Zookeeper do?

1. Naming Service 2. Configuration Management 3. Cluster Management 4. Distributed lock 5. Queue Management

6.Zookeeper Naming Service

Create a directory in the zookeeper file system that has a unique path. When we use tborg cannot determine the upstream program deployment machine can be agreed with the downstream program path, through path can explore each other discovery.

Configuration Management for 7.Zookeeper

Programs always need to be configured, and if the program is distributed across multiple machines, it becomes difficult to change the configuration individually. Now put all these configurations to Zookeeper, save in a directory node in Zookeeper, and then all the related applications to listen to this directory node, once the configuration information changes, each application will receive Zookeeper notification, and then from the Zookeeper Get the new configuration information applied to the system just fine

<ignore_js_op>

8.Zookeeper Cluster Management

The so-called cluster management does not care about two points: whether there are machines to quit and join, election master.

For 1th, all machine conventions create temporary directory nodes under the parent directory groupmembers, and then listen for child node change messages for parent directory nodes. Once a machine hangs up, the machine is disconnected from the zookeeper, the temporary directory node it creates is deleted, and all other machines are notified: A sibling directory is deleted, so everyone knows: it's on board.

New machine join is similar, all machines receive notification: New Brothers directory join, Highcount again, for the 2nd, we change a little bit, all machines create a temporary sequential numbered directory node, each time the number of the smallest machine to choose as Master.

<ignore_js_op>

9.Zookeeper Distributed lock

With the Zookeeper consistency file system, the lock problem becomes easy. Lock services can be divided into two categories, one is to maintain exclusivity, and the other is to control timing.

For the first class, we think of a znode on the zookeeper as a lock, implemented by Createznode way. All clients are going to create the/distribute_lock node, and the client that was successfully created has the lock. Release the lock by deleting the Distribute_lock node that you created.

For the second class,/distribute_lock is pre-existing, all clients create a temporary sequential numbered directory node under it, and the same as the master, the number of the smallest to obtain the lock, use the deletion, in turn convenient.

<ignore_js_op>

10.Zookeeper Queue Management

Two types of queues:

1, synchronization queue, when a member of a queue is NAND, this queue is available, otherwise wait for all members to arrive.

2, queue in accordance with the FIFO mode of the team and the operation.

First, create a temporary directory node under the Contract directory, and listen for the number of nodes we require.

The second category, and the control timing scenario in the Distributed lock service, has the same basic principle, into row number, and dequeue by number.

11. Distributed and Data replication

Zookeeper provides a consistent data service as a cluster, and naturally, it is replicating data across all machines. Benefits of data replication:

1, fault tolerance: A node error, not to let the whole system stop working, other nodes can take over its work;

2, improve the expansion of the system capacity: the load distribution to multiple nodes, or increase the number of nodes to improve the system load capacity;

3, Improve performance: Let the client local access to the nearest node, improve user access speed.

From the transparency of client read and write access, the data replication cluster system is divided into the following two types:

1. Write Master (writemaster): Changes to the data are submitted to the specified node. Read no this limit, you can read any one node. In this case, the client needs to distinguish between reading and writing, commonly known as read-write separation;

2, write any: changes to the data can be submitted to any node, as read. In this case, the client is transparent about the role and change of the cluster node.

For zookeeper, the way it's used is to write arbitrarily. By adding machines, its read-throughput capability and responsiveness are very good, and writing, as the machine's throughput capacity is sure to drop (which is why it builds observer), and responsiveness depends on how it is implemented, whether it's delaying replication for eventual consistency, or immediately replicating a quick response.

12.Zookeeper Role Description

<ignore_js_op>

13.Zookeeper and Client

<ignore_js_op>

14.Zookeeper Design Purpose

1. Final consistency: No matter which server the client connects to, it is the same view that is presented to it, which is the most important performance of zookeeper.

2. Reliability: With simple, robust, good performance, if the message is accepted to a server, then it will be accepted by all servers.

3. Real-time: zookeeper to ensure that the client will be in a time interval to obtain updates to the server, or server failure information. However, due to network delay and other reasons, zookeeper cannot guarantee that two clients can get the newly updated data at the same time, if you need the latest data, you should call the sync () interface before reading the data.

4. Wait unrelated (Wait-free): Slow or invalid client must not intervene in the fast client request, so that each client can effectively wait.

5. Atomicity: Updates can only succeed or fail with no intermediate state.

6. Sequence: including global order and partial order: Global order is that if the message a on a server is published before message B, on all servers, message A will be published in front of message B; The partial order is that if a message B is published by the same sender after message A, a must precede B.

15.Zookeeper Working principle

The core of Zookeeper is atomic broadcasting, a mechanism that guarantees synchronization between the various servers. The protocol that implements this mechanism is called the Zab protocol. The ZAB protocol has two modes, namely the recovery mode (select Master) and broadcast mode (synchronous). When the service is started or after the leader crashes, the Zab enters the recovery mode, and when the leader is elected and most of the servers are synchronized with the leader state, the recovery mode is finished. State synchronization ensures that the leader and server have the same system state.

To ensure the sequential consistency of transactions, zookeeper uses an incremented transaction ID number (ZXID) to identify transactions. All the proposals (proposal) were added to the ZXID when they were presented. The implementation of ZXID is a 64-bit number, it is 32 bits high is the epoch used to identify whether the leader relationship changes, each time a leader is chosen, it will have a new epoch, marking the current period of the reign of the leader. The low 32 bits are used to increment the count.

16.Zookeeper under server operating status

Each server has three states in the process of working:

Looking: Current server does not know who leader is, is searching
Leading: The current server is an elected leader
Following:leader has been elected and the current server is in sync with it.

17.Zookeeper Selection Master process (Basic Paxos)

When leader crashes or leader loses most of the follower, when ZK enters recovery mode, the recovery mode needs to re-elect a new leader, so that all servers are restored to a correct state. ZK's election algorithm has two kinds: one is based on basic Paxos, the other is based on the fast Paxos algorithm. The default election algorithm for the system is fast Paxos.

1. The election thread is held by the current server-initiated election thread, whose main function is to count the poll results and select the recommended server;

2. The election thread first initiates an inquiry (including itself) to all servers;

3. After the election thread receives the reply, verifies whether it is an inquiry initiated by itself (verifies that the ZXID is consistent), then obtains the other person's ID (myID), and stores it in the current Query object list, and finally obtains leader related information (ID,ZXID) proposed by the other party. and store this information in the Voting record table of the election;

4. After receiving all the server replies, calculate the ZXID largest server, and set the server related information to the next server to vote;

5. The thread sets the current ZXID maximum server to the current server to recommend leader, if the winning server obtains N/2 + 1 of the server votes, sets the currently recommended leader for the winning server, will set its own state based on the information about the winning server, otherwise, continue the process until leader is elected. Through process analysis we can conclude that to enable leader to obtain support from most servers, the total number of servers must be odd 2n+1 and the number of surviving servers should not be less than n+1. These processes are repeated after each server startup. In recovery mode, if the server that is just recovering from a crash or just started also recovers data and session information from a disk snapshot, ZK logs the transaction log and periodically snapshots it to facilitate state recovery on recovery. The specific flowchart of the selected master is as follows:

<ignore_js_op>

18.Zookeeper Select Main process (Fast Paxos)

The Fast Paxos process is an election process in which a server first proposes itself to be a leader to all servers, and when other servers receive the offer, resolves the clash between the epoch and ZXID and accepts the other's proposal, then sends a message to the other party accepting the proposal to complete, Repeat the process, and you will finally be able to elect the leader.

<ignore_js_op>

19.Zookeeper Synchronization Process

After selecting leader, ZK enters the state synchronization process.

1. Leader wait for server connection;

2. Follower connection leader, the largest Zxid sent to leader;

3. Leader the synchronization point according to the zxid of follower;

4. After completing the synchronization notification follower has become uptodate status;

5. Follower receives the uptodate message, it can re-accept the client's request for service.

<ignore_js_op>

20.Zookeeper Work Flow-leader

1. Recover data;

2. Maintain heartbeat with learner, receive learner request and Judge learner request message type;

3. The learner message types are mainly ping messages, request messages, ACK messages, revalidate messages, and different processing depending on the message type.

PING message refers to the heartbeat information of learner;

The request message is the proposed information sent by follower, including write requests and synchronization requests;

The ACK message was follower's reply to the proposal, and more than half of the follower passed, then the proposal was commit;

The revalidate message is used to extend the session's effective time.

<ignore_js_op>

21.Zookeeper Work Flow-follower

Follower has four main functions:

1. Send a request to the leader (Ping message, request message, ACK message, revalidate message);

2. Receive leader messages and process them;

3. Receive client's request, if write a request, send to leader to vote;

4. Returns the client result.

The follower message loop handles the following messages from leader:

1. Ping message: Heartbeat message;

2. Proposal NEWS: Leader initiated the proposal, request follower vote;

3. Commit message: Information about the latest proposal on the server side;

4. UpToDate message: Indicates that synchronization is complete;

5. Revalidate message: According to Leader's revalidate results, close the session to be revalidate or allow it to accept messages;

6. Sync message: Returns the sync result to the client, originally initiated by the client, to force the latest update.

<ignore_js_op>

Well, the above is my understanding of zookeeper, and I will continue to update the new technology for everyone please look forward to it!!!

Original link

function and working principle of zookeeper

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More