Basic concepts and important characteristics of zookeeper

Source: Internet
Author: User
Tags ack event listener unique id zookeeper

Directory

    • 1. What is Zookeeper
    • 2. Zookeeper Cluster role
    • 3. Zookeeper's data model
      • Types of 3.1 Znode
      • Structure of the 3.2 znode
    • 4. Zookeeper's event monitoring mechanism
    • 5. Zookeeper how to ensure distributed data consistency--zab protocol
      • 5.1 Processing flow for transaction requests
      • 5.2 Leader Server election process
    • 6. Zookeeper how to fault-tolerant server
    • 7. References
1. What is Zookeeper

Zookeeper is an important component of the big data ecosystem, and you should always be able to see it if you have done the relevant development. It is open source by Yahoo and becomes Apache's top project. To define it in a sentence is: It is a set of high throughput distributed coordination system. From which we can know that zookeeper has at least the following characteristics:

    • The primary role of 1.Zookeeper is to provide coordination services for distributed systems, including but not limited to: distributed locks, unified naming services, configuration management, load Balancing, master server elections, and primary and subordinate switching.
    • The 2.Zookeeper itself is often distributed in its own way. A zookeeper service is usually made up of multiple server nodes, so as long as more than half of the nodes survive, zookeeper can provide services to the outside, so zookeeper also implies highly available features. Clients can connect to any server-side node through the TCP protocol to request the zookeeper cluster to provide services, and the details of how the cluster communicates and how to maintain distributed data consistency are transparent to the client. As shown
    • 3.Zookeeper is designed with high throughput, so it has a very good performance when reading and writing less. As shown

The vertical axis is the number of client requests responding per second, and the horizontal axis is the percentage of read requests. It is clear that as the percentage of read requests increases, Zookeeper's QPS is increasing.
The main reasons for the high throughput characteristics of zookeeper are as follows:

Any server node in a 1.Zookeeper cluster can respond directly to a client's read request (the write request will be different, as discussed below), and can be scaled horizontally by increasing the node. This is the main reason for its high throughput

2.Zookeeper stores the full amount of data in memory, and reading data from memory does not require disk IO, much faster.

3.Zookeeper relaxed the strong consistency requirements for distributed data, that is, the data is not guaranteed to be consistent in real time, allowing the distributed data to reach the final agreement through a time window, which also improves its throughput to some extent.

The write request, or transaction request, is partly affected by the synchronization of the state between the different server nodes. Thus, simply increasing the number of server nodes in the zookeeper does not necessarily have a positive effect on its throughput. The increase in server nodes facilitates the throughput of read requests, but it increases the synchronization time of server node data and must be balanced between the two as appropriate.

2. Zookeeper Cluster role

In the zookeeper cluster, there are three types of server roles, Leader,follower and Observer, respectively.

    • Leader
      Leader server during the entire normal operation and only one, the cluster will be elected to the leader server, the unified processing of the cluster's transactional requests and the server within the cluster scheduling.

    • Follower
      The main responsibilities of follower are as follows:
      • 1. Participation in leader election voting
      • 2. Participation in the transaction request proposal vote
      • 3. Handle the client non-transactional request (read) and forward the transaction request (write) to the leader server.
    • Observer
      Observer is a weakened version of follower. It is able to handle non-transactional, read-only requests like follower, and forwards transaction requests to the leader server, but it does not participate in any form of voting, whether it is a leader vote or a transaction request proposal. This role was introduced primarily to increase the non-transactional throughput of the cluster without affecting the ability of the cluster transaction processing.

3. Zookeeper's data model

Zookeeper stores data in memory, specifically, Znode is the smallest unit that stores data. And Znode is organized in a hierarchical structure, describing a tree. Its externally available view is similar to the Unix file system. The root znode node of the tree corresponds to the root path of the Unix file system. Just as a subdirectory in Unix can have subdirectories, Znode nodes can also be mounted under a node, resulting in the structure shown below.

In the case of file system analogy, Znode Natural has two attributes: directory and file, that is, Znode can be used as a file to write things, but also as a directory to mount other znode below. Of course, since znode have different types, the latter part is not entirely accurate.

Types of 3.1 Znode

Znode can be divided into persistent nodes (persistent) and temporary nodes (ephemeral) according to the length of its life cycle; At creation time, you can also choose whether the zookeeper server adds a sequence number after its path to differentiate the order in which multiple nodes are created under the same parent node.
The following 4 types of Znode nodes are combined.

    • 1. Persistent junction (persistent)
      The most common type of znode, once created, will be present on the server side unless the client deletes through the delete operation. You can create child nodes under persistent nodes.

    • 2. Persistent sequential node (persistent_sequential)
      Based on the basic characteristics of persistent nodes, the order in which multiple child nodes are created is distinguished by a sequence number in the node path suffix. This work is done automatically by the zookeeper server, as long as the node type is specified when the Znode is created.

    • 3. Temporary junction (ephemeral)
      The life cycle of the staging node is consistent with the client session. The temporary node is also present when the client segment session is present, and the temporary node is automatically deleted by the server when the client session is disconnected. Sub-nodes cannot be created under temporary nodes.

    • 4. Temporal order Node (ephemeral_sequential)
      Has the basic characteristic of the temporary node, but also has the order.

Structure of the 3.2 znode

The Znode structure is mainly composed of the data information and state information stored in it, and obtains the information of a Znode node through the GET command as follows

The first line stores the data information of Znode, which is the state information of Znode from Czxid.
Znode The state information is more, pick a few more important talk

    • CZXID:
      That is, created Zxid, which represents the transaction ID that created the Znode node

    • MZXID:
      The modified Zxid, which indicates the transaction ID of the node last updated.

    • Version
      The version number of the Znode node. Each Znode node is created with a version number of 0, and each update results in a version number plus 1, even if the value znode stored before and after the update does not change the version number will also add 1. The version value can be visually understood as the number of times the Znode node was updated. Znode The version number information in the status information, so that the server can control multiple clients on the same Znode update operation concurrently. The whole process is a bit like the CAs in Java, which is an optimistic locking concurrency control strategy, and the version value plays the function of conflict detection. The client gets the version information of the Znode and the version information is appended to the update, and the server must compare the version of the client and the actual version of Znode when updating the Znode, which will only be modified if the two version is consistent.

4. Zookeeper's event monitoring mechanism

Zookeeper can be used to implement the event monitoring mechanism through watcher. The client can register watcher with the server to listen for certain events, and once the event occurs, the server sends a notification to the client. Its main workflow is as shown in

Specifically, Watcher is the event listener interface provided in the Zookeeper native API, where the user must implement the interface and rewrite the process (Watchedevent event) method to implement the event listener. This method defines the callback logic that the client receives after receiving a service-side event notification. What events on the service side can be monitored? According to the notification state divided into syncconnected,disconnected,expired,authfailed, and so on, here are mainly about the syncconnected state of several types of events:

    • Node (-1)
      Client successfully established session with server

    • nodecreated (1)
      The corresponding znode of the Watcher listener is created

    • Nodedeleted (2)
      Watcher Monitor Znode is deleted.

    • Nodedatachanged (3)
      The Znode data content of the Watcher listener is changed, noting that the event will be triggered even if the data content is exactly the same as before and after the change, or that the trigger condition for the event is Znode version number change is not a problem

    • Nodechildrenchanged (4)
      Watcher monitoring of the corresponding Znode sub-node changes

The Zookeeper event monitoring mechanism has the following features:

    • 1. When the listener listens for an event to be triggered, the server sends a notification to the client, but the notification message does not include the specific contents of the event. As an example of monitoring Znode node data changes, when Znode's data is changed, the client receives a notification of the event type nodedatachanged, but the Znode data changes to what the client cannot get from the notification, requiring the client to obtain it manually after receiving the notification.

    • 2.Watcher is disposable. Once triggered, it will be invalidated. If you need to listen repeatedly, you need to register again and again. This is designed to alleviate the pressure on the service side, but it is rather unfriendly to the developer, but there is no need to worry about the ability to permanently monitor an event with some zookeeper open source clients.

5. Zookeeper how to ensure distributed data consistency--zab protocol

Zookeeper uses the Zab (Zookeeper Atomic Broadcast) protocol to ensure distributed data consistency. Zab is not a generic distributed consistency algorithm, but a crash-recoverable atomic message broadcast algorithm designed for zookeeper. The Zab protocol consists of two basic modes: Crash Recovery mode and message broadcast mode. The crash recovery model is mainly used for the new leader server elections and data synchronization after the cluster boot process or the leader server crashes, and the message broadcast mode is used primarily for transaction request processing. Here are the two ways to introduce

5.1 Processing flow for transaction requests

The core of the Zab protocol is the definition of how transaction requests are handled, and the whole process can be summarized as follows:

    • 1. All transaction requests are handled by the cluster's leader server, and the leader server translates a transaction request into a proposal (proposed) and generates a globally incrementing unique ID, which is the transaction ID, which is the ZXID, The leader server sorts and processes proposal by its ZXID order.
    • 2. The leader server then puts proposal into each follower corresponding queue (leader assigns a separate queue for each follower) and sends it to the follower server in a FIFO manner.
    • After the 3.Follower server receives the transaction proposal, it first writes the local disk as a transaction log, and returns an ACK response to the leader server after success.
    • The 4.Leader server will broadcast a commit message to follower to notify it of the proposal submission, as long as it receives an ACK response of more than half follower, and leader itself will also complete the proposal submission.

The whole process is as shown

5.2 Leader Server election process

When a leader server does not exist in the cluster, the cluster makes an election to the leader server, which usually occurs in two situations: 1. Cluster just started 2. The cluster is running, but the leader server exits for some reason. The servers in the cluster send messages to all other follower servers, which can be visualized as ballots, and the ballot paper consists of two messages, the ID of the leader server (that is, the number configured in the myID file), and the server's transaction ID. A transaction represents an operation on a server state change, and the greater the transaction ID of a server, the newer the data is. The entire process is described as follows:

    • The 1.Follower server casts ballots (SID,ZXID), and for the first time each follower elects itself as a leader server, which means that the first ballot for each follower is its own server ID and transaction ID.
    • 2. Each follower receives a ballot from another follower, which regenerates a ballot based on the following rules: Compare the number of votes received and the size of their own zxid, and select the largest of them; if ZXID, the SID is the largest server ID. Eventually, each server will regenerate a ballot paper and cast the ballot.

So after multiple rounds of voting, if a server gets more than half of the votes, it will now be selected as leader. From the above analysis, the zookeeper cluster to the leader server choice is biased, biased towards those zxid larger, that is, the data update machine.

The whole process is as shown

6. Zookeeper how to fault-tolerant server

Zookeeper avoids data loss due to server failure through transaction logs and data snapshots.

    • The transaction log refers to the fact that the server writes transaction operations to the disk as logs before updating the memory data, and the leader and follower servers log the transaction logs.
    • A data snapshot refers to the periodic traversal of the in-memory tree structure data into a external memory snapshot. Note, however, that this snapshot is "blurry" because the memory data may have changed while the snapshot was being taken. However, because the zookeeper itself guarantees the power of the transaction operation, the data is restored to the latest state by executing the transaction log when the snapshot is loaded into memory.
7. References
    • The principle and practice of distributed consistency from Paxos to zookeeper
    • "Big Data Day"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.