ZooKeeper: A Distributed Coordination Service for distributed applications

Source: Internet
Author: User
Original article address: ZooKeeper: ADistributedCoordinationServiceforDistributedApplicationsZooKeeper is a distributed open-source Coordination Service for distributed applications. It uses a set of simple operation primitives to enable distributed applications to implement high-level services, such as synchronization, configuration maintenance, group and naming management.

Original article address: ZooKeeper: A Distributed Coordination Service for Distributed Applications ZooKeeper is A Distributed open-source Coordination Service for Distributed Applications. It uses a set of simple operation primitives to enable distributed applications to implement high-level services, such as synchronization, configuration maintenance, group and naming management.

Original article address: ZooKeeper: A Distributed Coordination Service for Distributed Applications

ZooKeeper is a distributed open-source Coordination Service for distributed applications. It uses a set of simple operation primitives to enable distributed applications to implement high-level services, such as synchronization, configuration maintenance, group and naming management. It is designed with ease of programming and uses a data model similar to the directory structure of a file system. The ZooKeeper service runs in the Java environment and can be used in Java and C.
As we all know, it is difficult for the Coordination Service to meet the correct conditions, especially problems such as competition conditions or deadlocks. The original intention of ZooKeeper is to ease the coordination of services in distributed applications from design to implementation.

Design Objectives

ZooKeeper is easy to implement. It allows distributed processes to coordinate with other processes through a shared hierarchical namespace (similar to the standard file system directory. A namespace consists of a data registry called znode (similar to files and directories in a file system. Unlike the file system for data storage, ZooKeeper keeps data in the memory, which means ZooKeeper has a higher throughput and a lower latency.
ZooKeeper is redundant. Just like the distributed process it coordinates, ZooKeeper itself replicates between hosts and becomes a cluster.

The servers that constitute the ZooKeeper service must be connected to each other. They maintain a state memory image through transaction logs and snapshots on persistent storage. ZooKeeper is available as long as most services are available.
The client connects to a ZooKeeper server, which maintains a TCP connection that sends a request, receives a response, obtains the listening time, and sends heartbeat information. If the connection is interrupted, the client will connect to another server.
ZooKeeper is ordered. Zookeeper marks each update operation based on the value that reflects the global transaction sequence of ZooKeeper. Subsequent operations can use this sequence to implement high-level abstraction, such as synchronization primitives.
ZooLeeper is efficient. Especially when the reading and writing ratio is large. The ZooKeeper application runs on thousands of servers, and its application performance will be good in such scenarios with fewer reads and writes (the read/write ratio is about ).

Data Model and hierarchical namespace

The namespace provided by ZooKeeper is similar to the standard file system. A name is a sequence composed of (/) separated path elements. Each Zookeeper node in the clear space is identified by a path.

Nodes and temporary nodes

Unlike the standard file system, each node in the ZooKeeper namespace has the same data as its sub-nodes. This is like a file that can also be used as a directory file system. (ZooKeeper is designed to store coordination data, such as status information, configuration, and location information. Therefore, the data stored on each node is usually small, in bytes to kilobytes ). We call the ZooKeeper data node znode.
Znode maintains a statistical structure (including the version number of data changes, access control list changes, and timestamps) to allow cache verification and coordinated updates. Each time the znode data changes, the version number increases. For example, when a customer obtains the data, it also accepts the version information of the data.
The read and write operations on the data stored by each znode in the namespace are atomic operations. The read operation returns all znode-related data bytes. The write operation replaces all znode-related data. Each node has an access control list to limit who has the permission to do anything.
ZooKeeper also has the concept of temporary nodes. These nodes will exist when the sessions of these nodes are created, and will be deleted after the session ends. Temporary nodes are useful when you want to implement some specific operations (the original [tdb] is not completed.

Competitive updates and monitoring

Zookeeper supports the concept of listening (watches. The client can set a listener on znode. A listener can be triggered and deleted after znode is changed. When a listener is triggered, the client receives a packet that notifies znode of changes. In addition, if the connection between the client and a server in ZooKeeper fails, the client will receive a local notification, which can be used for high-availability service implementation ([tdb] the original article is not complete ).

Guarantee

ZooKeeper is very simple and efficient. Although its design goal is to build a foundation for more complex services (such as synchronization), it provides some assurance mechanisms:
Consistent order: Updates on the client will be applied in the order they send the operation.
Atomicity: No part of updates exist no matter whether the update is successful or not.
Single System Image: No matter which server a client connects to, the same view of the service is displayed.
Reliability: Once an update is applied, the update starts to be persistent until the next time the client overwrites the updated content.
Timeliness: The client view of the system ensures that the system is up to date at a specific time.
For more information, see Official documents and white papers. ([Tdb])

Simple API

One of ZooKeeper's design goals is to provide a very simple programming interface, so it only provides the following operations:
Create-creates a node at a location in the tree
Delete-deletes a node
Exists-tests if a node exists at a location
Get data-reads the data from a node
Set data-writes data to a node
Get children-retrieves a list of children of a node
Sync-waits for data to be propagated
For more information about these contents and how to use them for high-level operations, see the official documentation. ([Tdb])

Implementation

The Zookeeper component displays the high-level components of the ZooKeeper service. In addition to the request processor, each server that makes up the zookeeper service backs up copies of each component locally.

The copied database is a tree-type memory database that contains all the data in the memory. The update operation is written to the disk and used for recovery. The write operation is serialized to the disk before the application is applied to the memory database.
Each ZooKeeper database can be used for client connection. The client connects to a server and submits a request. Read requests can be obtained in the copy of each server database, while requests and write requests that change the service status are obtained through a consistent protocol.
As part of the consistency protocol, all write requests from the client will flow into a single server-known as the leader, and the remaining ZooKeeper servers are called followers ), they receive message proposals from leaders and reach consensus on message delivery. The messaging layer processes the election of new leaders when a leader fails, as well as the synchronization between followers and leaders.
ZooKeeper uses a custom atomic message passing protocol. Because the message passing layer is an atomic operation, ZooKeeper can ensure that local copies do not produce deviations. When a leader receives a read request, it calculates the state of the system to the time when the application is written to the application, and converts the state of the system to the new state of the transaction.

Use

The Zookeeper programming interface is very simple. However, with these APIs, You can implement high-level operations (such as synchronization primitives, group members, and ownership ). For more operations on distributed applications, refer to the White Paper and video materials ([tdb]).

Performance

ZooKeeper was designed with high performance. But is that true? The ZooKeeper development team at the Yahoo research center confirmed the high performance of ZooKeeper, especially for applications with fewer reads and writes (see figure), because write operations need to be synchronized among all ZooKeeper servers. (Reading, writing, and less are typical applications of the Coordination Service)

The figure shows the throughput of ZooKeeper 3.2 running on two 2 GHz Xeon processors and two SATA 15 k rpm drives. A disk drive is used as a dedicated device for ZooKeeper logs. The snapshot is written to the disk drive where the operating system is located. All read/write operations are performed on 1 kb of data. In the figure, "Servers" refers to the size of the ZooKeeper service, that is, the number of Servers that constitute the service. The client simulates it using about 30 other servers. The ZooKeeper cluster configuration does not allow clients to connect to the leader.
Tip: the r/w performance of version 3.2 is twice that of Version 3.1.
The benchmark test above also shows that ZooKeeper is reliable. Displays the responses of ZooKeeper in various failures. The events marked in the figure are:
1. failed and recovered followers
2. Failure and recovery of another follower
3. Leader failure
4. Failure and restoration of two followers
5. Another leader failed

Reliability

To show the system behavior when a node fails, we run the same benchmark test on a ZooKeeper service consisting of seven machines as in the previous section, but this time we fixed the write operation percentage to 30%, which is a conservative estimate of the expected load ratio.

There are several points worth looking. First, if the followers fail to quickly recover, ZooKeeper can maintain a high throughput. But more importantly, the leader's election algorithm allows the system to quickly recover to avoid a substantial decline in throughput. We have observed that the ZooKeeper cluster takes less than 200 ms to elect a new leader. Third, ZooKeeper can restore high throughput once the followers restore and start to process requests.
?
ZooKeeper Project

ZooKeeper has been successfully used in many industrial applications. Yahoo! In Yahoo! In Message Broker, ZooKeeper is used for coordination and fault recovery. Yahoo! Message Broker is a highly scalable publishing-subscription system (currently mainstream kfaka also uses ZooKeeper) that manages thousands of topics that require copying and data transmission. Yahoo! Many advertising systems also use ZooKeeper to provide reliable services.
We encourage users and developers to join the Community. For more information, see the Apache ZooKeeper project.

[Translation] ZooKeeper: A Distributed Coordination Service for distributed applications. Thank you for sharing this with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.