Zookeeper official documents--Overview

Source: Internet
Author: User

Zookeeper: distributed Coordination Services for a distributed application  Zookeeper is a distributed, open-source, coordinated service framework that serves distributed applications. It exposes a range of basic operational services, so distributed applications can build higher-level services based on these services, such as synchronization, configuration management, naming, and grouping services.

Zookeeper is designed to be easy to encode, and the data model is built in our familiar tree-structured directory-style file system.

Zookeeper runs in Java and supports both Java and C languages. The correct implementation of coordination services is recognized as a hard job. They are prone to errors, such as resource competition and deadlock.

Zookeeper's mission and strength comes from the process of distributing distributed applications out of the mire of processing coordination services. Zookeeper's design goalsThe zookeeper is very simple. Zookeeper allows distributed applications to coordinate with each other through a shared tree namespace, which is similar to how the file system is organized.

The Zookeeper namespace consists of data registers, which we call Zookeeper, Znodes, which is very similar to files and directories.

Unlike a typical file system, it is designed to store data. Zookeeper data is stored in memory, which means that zookeeper can achieve a high throughput and low latency. (Because of the high efficiency of memory operation)

Zookeeper implementations are particularly concerned with high performance, high availability, and strict sequential access. The excellent performance of zookeeper also means that it can be used in large distributed systems. Reliability keeps us away from a single point of failure. Strict sequential access means that complex synchronization primitives can be implemented on the client side.

Zookeeper is a replica, just like distributed processing coordination services, zookeeper itself intends to use a replica mechanism in a server cluster, which we call the whole.

All machine nodes that make up the zookeeper service must perceive each other. They maintain a memory snapshot of the current machine state, with a snapshot of the transaction log and persisted storage.

As long as most of the machines are available then the entire Zookeeper service is available.

The client connects to one of the zookeeper servers, the client and the zookeeper server keep a TCP connection, and the request is sent over a TCP connection to receive a response.

Get event Listener, send heartbeat and so on. If the TCP connection is interrupted, the client connects to another zookeeper server.

Zookeeper is ordered, zookeeper gives a number to each update operation, and this number reflects the order of the transactional modifications to the zookeeper.

Subsequent operations can use this order to achieve a higher level of abstraction, such as synchronizing primitives.

zookeeper is very fast, especially for applications that read the dominant application load. Zookeeper applications run on thousands of machines,The performance of zookeeper is optimal when the read-write ratio is 10:1.  architecture of data models and namespacesZookeeper provides namespaces very much like our standard file system. Naming is a series of used/segmented path elements. Each zookeeper node's namespace is identified by a path. nodes and temporary nodes

Unlike a standard file system, each node in the Zookeeper namespace can have data associations and child nodes.

It's like a file object in a file system can be a file or a folder. (Zookeeper is designed to store coordinated data:

Status information, configuration information, location information, and so on, so that the data is stored in each node is usually very small, from a few bytes to thousands of bytes,

We use the term znode to make the semantics clearer when we discuss zookeeper data node-related content.

Znode manages the inclusion of this state structure data, which contains the version number of the data modification, ACL modification and timestamp, allowing cache checksum reconciliation to be updated.

Znode data is modified each time, the version number is added one. As a practical example, each time the client receives the data, it also receives the version number of the current data.

Each node has an ACL access control list that strictly controls who can operate.

Zookeeper also has semantic support for session-level nodes, which are activated as sessions are created, and nodes are deleted when the session ends.

Session-level nodes are useful when implementing examples. (Temporary node) conditional update and monitoring

The zookeeper supports monitoring and the client can set the listener Znode node. Monitoring may be triggered or removed when the Znode node changes. When the listener event is triggered,

The client will receive a data notification packet telling the client that the node data has been modified. At the same time, if the connection to the current client and the zookeeper node is disconnected.

The client will receive a local notification. These features can be used in specific examples. Zookeeper's Guarantee

Zookeeper simple and excellent performance. Because of his simple and fast goal, he became the basis for building many more complex services, such as synchronization services, and he provided a series of guarantees.

1 Sequential consistency: Update operations from the client will be in order.

2 Atomic operations: Updates either all succeed, or all fail, with no partial results.

3 Unified System Image: Regardless of which server the client is linked to, the same view of the service is available, i.e. he is stateless.

4 Reliability Guarantee: Once the write operation is executed (acting on the server), the state will be persisted until the other client's modifications take effect.

5 Timeline Feature: The Client Access server system image ensures that the current system is updated in real time within a specific time access

Simple Operation API

One of the design principles of zookeeper is to provide a simple programming interface. As a result, he provides only the following actions.

1 Create-Creates a node at a location in the directory structure tree. 2 Delete--Deletes a node. 3 Determine whether a specified node exists in a location. 4 Get data--get the data from the node. 5 Set/Write data--write data to a node. 6 synchronization-Waits for the write data to propagate to other nodes. For more in-depth discussion of these features and operations, and how they can achieve a higher level of operation, see the use of examples for more information. Zookeeper Implementation DetailsThe Zookeeper component composition diagram shows the more advanced Component Services in the Zookeeper service.  In addition to the exception of the request processor. Each zookeeper server that makes up the zookeeper service holds a copy of each component's own backup. The replica database is an in-memory database that holds data for the entire directory structure tree. All update operations are recorded in the disk log for the recovery of abnormal conditions. All write operations are serial until the update is in memory data, which guarantees strong data consistency. Each of the Zookeeper server nodes serves several clients. The client connects to one of the specified server nodes (the random allocation algorithm allocates the bar) and interacts with the zookeeper. Read requests are serviced by a local copy of each zookeeper server memory database (which can improve read performance, which is why zookeeper has the best performance in the case of a read-write ratio of 10:1), which involves modifying the server state and writing the data. It needs to be handled through a conformance protocol.The consistency protocol stipulates that all writes are performed by the elected only machine, the node we call leader (the master node). the rest of the zookeeper machine, called from the node (follower), receives suggestions from the leader node and agrees to transmit the message, which means that the message only obeys and transmits the characteristics. the message layer is concerned with how to replace the leader node after it is hung and synchronize the data between the leader node and the follower node. Zookeeper uses the client Atom message protocol. Because the message layer is atomic, zookeeper guarantees that the local copy and server version are synchronized.When the leader node receives a write request, the leader node calculates what the state of the current system is, when it will write to the server, and converts the current write operation into a transaction and records the new state. use case

The programming interface of zookeeper is very simple to design, but it can be used to implement some other higher level operations, such as synchronizing primitives, grouping and electing members, and so on.

Some distributed applications use these interfaces to implement some of the more advanced features. Performance

Zookeeper was designed to be a high-performance framework, but what about the facts?

The research from Yahoo's zookeeper development team proves this.

(You can see the graph of the lower zookeeper throughput as the read-write ratio changes, i.e.)

Performance is particularly good in applications where reading accounts for a major proportion, as write operations involve synchronization of state between servers. Especially in the typical case of coordination services, the performance is particularly prominent.

Zookeeper throughput and read-write ratios are in the zookeeper3.2 version, running on dual-core 2Ghz and SATA 15k RPM processor-configured servers.

One of them is responsible for zookeeper log devices. The snapshot information is written to the operating system driver.

The write request is a 1Kb write, and the read request is also a 1kb write (the read-write unit is 1Kb).

Servers indicates the cluster size of the entire zookeeper, which is the number of servers that comprise the zookeeper service.

The entire zookeeper cluster has a qualification, and the client cannot directly connect to the zookeeper leader node.  Reliability

In order to demonstrate that our system has evolved over time and with errors, we have run a zookeeper service consisting of 7 machines, using benchmarks that are as saturated as before,

But this time we set the write operation to a ratio of 30%, which is a conservative estimate of the load we expect.

There are several key points of observation in this diagram. First, if the slave node fails and recovers quickly, the zookeeper can withstand a relatively high throughput even if the slave node fails.

But, more importantly, the leader node's election algorithm allows the system to recover quickly, preventing the throughput from rapidly decreasing in a short period of time. The zookeeper can elect a new leader node within 200 milliseconds,

Thirdly, with the recovery from the node, the throughput of the zookeeper can be rapidly increased, once the recovery from the node begins processing the request. Zookeeper Project

Zookeeper has been successful in many enterprise applications, using zookeeper for coordination and failure recovery services within Yahoo, as well as message broker services

The message broker is a highly scalable publish-subscribe-based messaging system that

He manages thousands of message topics, enabling the ability to copy and transfer data.

Zookeeper is also used within Yahoo for data capture services, which are web crawlers and are committed to failure recovery.

Many of Yahoo's advertising systems use zookeeper to achieve high-reliability services.

Zookeeper official documents--Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.