Zookeeper function and working principle

Source: Internet
Author: User
Tags ack joins parent directory zookeeper

1.ZooKeeper is what.

Zookeeper is a distributed, open source distributed Application Coordination Service, is Google's chubby an open source implementation, it is the cluster managers, monitoring the status of each node in the cluster according to the feedback of the node submitted to the next reasonable operation. Ultimately, Easy-to-use interfaces and high-performance, functionally stable systems are delivered to the user

what 2.ZooKeeper offers.

1) file system

2) Notification mechanism

3.Zookeeper File System

Each subdirectory entry, such as Nameservice, is called Znode, and as with the file system, we are free to add and remove Znode, add and remove Znode under a znode, the only difference being that Znode can store data.
There are four types of Znode:
1. persistent-Persistent Directory node
The node still exists after the client disconnects from the zookeeper
2. persistent_sequential-persistent sequential numbered directory node
The node still exists after the client disconnects from the zookeeper, except that the node name is zookeeper sequentially
3, ephemeral-Temporary directory node
After the client disconnects from the zookeeper, the node is deleted
4. ephemeral_sequential-temporary sequential numbered directory node
After the client disconnects from the zookeeper, the node is deleted, but the node name is zookeeper sequentially


4.Zookeeper notification mechanism

Client registration listens to the directory nodes it cares about, and zookeeper notifies the client when the directory node changes (data changes, deletes, and deletion of subdirectories nodes).

what 5.Zookeeper did.

1. Naming Service 2. Configuration Management 3. Cluster Management 4. Distributed lock 5. Queue Management

6.Zookeeper Naming Service

A naming service refers to the address of a resource or service by a specified name, and the provider's information. Using zookeeper it is easy to create a global path, and this path can be used as a name, which can point to clusters in the cluster, the address of the service provided, the remote object, and so on. In short, using zookeeper as a naming service is using a path as the name, and the data on the path is the entity to which the name refers.

Alibaba Group's Open source distributed services Framework Dubbo uses zookeeper as its naming service to maintain a global list of service addresses. In the Dubbo implementation:

When the service provider starts, it writes its URL address to the specified node/dubbo/${servicename}/providers directory on the ZK, which completes the Service Release .

When the service consumer starts, subscribe to the provider URL address in the/dubbo/{servicename}/providers directory and to the/dubbo/{servicename}/ Write your own URL address in the Consumers directory.

7.Zookeeper Configuration Management

Programs always need to be configured, and if the program is distributed across multiple machines, it becomes difficult to change the configuration one by one. Now put all of these configurations on zookeeper, save them in a directory node in zookeeper, and then all the relevant applications listen to this directory node, once the configuration information changes, each application will receive zookeeper notification, and then from Zookeeper get new configuration information and apply it to the system.


8.Zookeeper Cluster Management

The so-called cluster management does not care about two points: whether there is machine exit and join, election master.
For 1th, all machine conventions create temporary directory nodes under the parent directory groupmembers, and then listen for child node change messages for parent directory nodes. Once a machine hangs up, the machine is disconnected from the zookeeper, the temporary directory node that it creates is removed, and all other machines are notified: A sibling directory is deleted, so everyone knows: it's on board.
The new machine joins is also similar, all machines receive the notification: The New brothers directory joins, Highcount again has, for the 2nd, we slightly change, all machines create the temporary sequential numbered directory node, each time chooses the smallest number machine as master.


9.Zookeeper Distributed Lock

With the Zookeeper consistency file system, the lock problem becomes easier. Lock service can be divided into two categories, one is to remain exclusive, the other is to control the timing.
For the first class, we think of a znode on the zookeeper as a lock, which is achieved by Createznode way. All clients create the/distribute_lock node, and the client that successfully created it has the lock. Remove the Distribute_lock node you created and release the lock.
For the second class,/distribute_lock already exists, and all clients create a temporary sequential numbered directory node below it, and as with master, the smallest number gets the lock, and the deletion is convenient in turn.


10.Zookeeper Queue Management

Two types of queues:
1, synchronization queue, when the members of a queue are NAND, this queue is available, otherwise waiting for all members to arrive.
2, the queue in accordance with the FIFO approach to the team and the operation.
In the first category, create a temporary directory node under the Convention directory, and listen for the number of nodes we require.
The second category, and the distributed lock service in the control sequence scene principle is consistent, the row is numbered, the column is numbered.

11. Distributed and Data replication

Zookeeper as a cluster to provide consistent data services, naturally, it will be replicated across all machines. Benefits of data replication:
1, fault tolerance: A node error, not to let the whole system stop working, other nodes can take over its work;
2, improve the system expansion capacity: the load distributed to multiple nodes, or to increase the number of nodes to improve the system load capacity;
3, Improve performance: Let the client local access to the nearest node, improve user access speed.

From the transparency of client read and write access, the data replication cluster system is divided into the following two types:
1, write the main (writemaster): The modification of the data submitted to the specified node. Read without this limit, you can read any one node. In this case, the client needs to distinguish between reading and writing, commonly known as read and write separation;
2, write arbitrary (write any): Changes to the data can be submitted to any node, as read. In this case, the client is transparent about the role and change of the cluster node.

For zookeeper, the way it is written is arbitrary. By adding machines, its ability to read throughput and responsiveness is very good, while writing, as the machine's increased throughput will definitely drop (which is also the reason it establishes observer), and responsiveness depends on the implementation, whether deferred replication remains final, or immediate replication quick response.

12.Zookeeper Role Description

role Description
Leader (leader) Provide read and write services to clients, responsible for voting initiation and resolution, updating system status.
Follower (followers) Provides read services to clients and, if it is a write service, is forwarded to leader. Participate in the voting during the election process.
Observe (Observer) A read server is provided to the client and, if it is a write service, forwarded to leader. Do not participate in the election process vote, nor participate in the "Half write success" strategy. Improves the read performance of the cluster without affecting write performance. This role is a new role in the zookeeper3.3 series.
Client (clients) Connect the zookeeper server to the use of the requesting initiator. Separate from the Zookeeper server cluster role.

13.Zookeeper and Client


14.Zookeeper Design Purpose

1. Final consistency: Client regardless of which server to connect to, show it is the same view, which is the most important performance of zookeeper.
2. Reliability: Simple, robust, good performance, if the message is accepted by a server, it will be accepted by all servers.
3. Real-time: Zookeeper ensure that the client will be in a time interval to obtain the server's update information, or server failure of information. However, due to the network delay and other reasons, zookeeper can not guarantee that two clients can get the data just updated, if you need the latest data, you should call the sync () interface before reading the data.
4. Wait-independent (Wait-free): Slow or invalid client must not interfere with fast client requests, so that each client can effectively wait.
5. Atomicity: Updates can only succeed or fail with no intermediate state.
6. Order: including global order and partial order: Global order means if message A on a server is released before message B, then message A on all servers will be published before message B, and the partial order means that if a message B is released by the same sender after message A, a will be in front of B.

15.Zookeeper Working principle

The core of zookeeper is atomic broadcasting, a mechanism that ensures synchronization between individual servers. The protocol that implements this mechanism is called the Zab protocol. There are two modes of the Zab protocol, which are the recovery mode (select Master) and broadcast mode (sync). When the service is started or the leader crashes, Zab is in the recovery mode, and when the leader is elected and most of the servers have finished synchronizing with the leader state, the recovery model is over. State synchronization guarantees that the leader and server have the same system state.
To ensure the order consistency of transactions, zookeeper uses an incremented transaction ID number (ZXID) to identify the transaction. All the proposals (proposal) were added ZXID when they were presented. In the implementation of ZXID is a 64-digit number, it is high 32 bit is epoch to identify whether the leader relationship changes, each time a leader is selected, it will have a new epoch, logo is currently belonging to the leader of the ruling period. The lower 32 bits are used to increment the count.

16.Zookeeper working status under Server

Each server has three states in the process of working:
Looking: The current server does not know who leader is and is searching for
Leading: The current server is the elected leader
Following:leader has been elected, the current server is synchronized with it

17.Zookeeper Select Main Flow (Basic Paxos)

When leader crashes or leader loses most of the follower, ZK enters recovery mode, and the recovery model needs to elect a new leader to restore all servers to a correct state. ZK's election algorithm has two kinds: one is based on basic Paxos implementation, the other is based on the fast Paxos algorithm. The system's default election algorithm is fast Paxos.

1. The election thread is held by the current server-initiated election thread, whose main function is to count the results of the poll and select the recommended server;
2. The election thread first initiates an inquiry (including itself) to all servers;
3. When the election thread receives a reply, verify that it is the one that initiated it (verify that ZXID is consistent), then obtain the other's ID (myID) and store it in the list of current query objects, and finally obtain the leader relevant information (ID,ZXID) proposed by the other party. and store this information in the voting record form of the election;
4. After receiving all the server responses, calculate the server with the largest ZXID and set this server-related information to the next server to be voted on;
5. The thread sets the current ZXID largest server to the leader to be recommended by the current server, and if the winning server obtains N/2 + 1 of the server votes, set the currently recommended leader as the winning server, will set its own status based on the information that was won, otherwise, continue the process until leader is elected. Through process analysis we can conclude that to enable leader to obtain support from most servers, the total number of servers must be odd 2n+1 and the number of surviving servers should not be less than n+1. Each server will repeat the above process after it is started. In recovery mode, if the server that was just recovered from a crash state or just started recovers data and session information from a disk snapshot, ZK logs the transaction log and periodically snaps to facilitate state recovery at recovery time. The specific flowchart for selecting the master is as follows:


18.Zookeeper Select Main flow (Fast Paxos)

The fast Paxos process is during the election process where a server first proposes to all servers that they want to be leader, and when other servers receive the offer, resolve epoch and zxid conflicts, accept each other's offer, and then send the message to the other side to accept the offer. Repeat this process, the final will be able to elect the leader.


19.Zookeeper Synchronization Process

After the leader is selected, ZK enters the state synchronization process.
1. Leader waiting for server connection;
2. Follower connection leader, the largest Zxid sent to leader;
3. Leader the synchronization point according to the zxid of follower;
4. After the completion of synchronization notification follower has become a uptodate state;
5. Follower received the UpToDate message, you can again accept the client's request for service.


20.Zookeeper Work Flow-leader

1. Recovery of data;
2. Maintain and learner heartbeat, receive learner request and Judge learner request message type;
3. Learner message types are mainly ping messages, request messages, ACK messages, revalidate messages, depending on the type of message, for different processing.
PING message refers to the learner heartbeat information;
The request message is the proposed information sent by follower, including write requests and synchronization requests;
The ACK message is follower's response to the offer, and more than half of the follower pass, then commit the proposal;
The revalidate message is used to extend the session valid time.


21.Zookeeper Work Flow-follower

Follower has four main functions:
1. Send a request to leader (Ping message, request message, ACK message, revalidate message);
2. Receive the leader message and handle it;
3. Receive client's request, if for write request, send to leader to vote;
4. Returns the client result.

The follower message loop handles several of the following messages from leader:
1. Ping message: Heartbeat message;
2. Proposal News: Leader launched a proposal to ask follower to vote;
3. Commit message: Server-side latest proposal information;
4. UpToDate message: Indicates synchronous completion;
5. Revalidate message: According to leader Revalidate result, close the session to revalidate or allow it to accept the message;
6. Sync message: Returns the sync result to the client, which was initially initiated by the client to force the latest updates.


Well, the above is my understanding of zookeeper, and I will continue to update the new technology for everyone please look forward to it ...

Original http://blog.csdn.net/xqb_756148978/article/details/52259381

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.