How zookeeper works

Source: Internet
Author: User
Tags zookeeper client

Zookeeper is generally used for Distributed locking and is not suitable for distributed storage, because each node of zookeeper, also known as znode, has a storage capacity limit of 1 MB.


The roles in zookeeper mainly include client, leader, and learner. Learner also includes observer and follower.


The client is the request initiator, the follower is the request receiver, and a result is returned to participate in the voting process.


The leader is responsible for initiating voting and making decisions, and updating the system status.


The observer does not participate in the voting and only synchronizes the leader status. It accepts client connections and forwards write requests to the leader. Observer is used to expand the system and increase the throughput speed.




The architecture of zookeeper is similar to that of the traditional file system. The difference is that each node of zookeeper can store 1 MB of data.

Zookeeper is mainly used to store coordination information (Coordination data), such as status information, configuration, location information

Because zookeeper is in-memory storage, Zookeeper can implement high throughput and low latency.


References: zookeeper wiki, Which is excerpted from overview

Zookeeper allows distributed processes to coordinate with each other through a shared hierarchical Name Space of data registers (we call these registers znodes), much like a file system. unlike normal file systems zookeeper provides its clients with high throughput, low latency, highly available, strictly ordered access to the znodes. the performance aspects of zookeeper allow it to be used in large distributed systems. the reliability aspects prevent it from becoming the single point of failure in big systems. its strict ordering allows sophisticated synchronization primitives to be implemented at the client.

The name space provided by zookeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash ("/"). every znode in Zookeeper's name space is identified by a path. and every znode has a parent whose path is a prefix of the znode with one less element; the exception to this rule is root ("/") which has no parent. also, exactly like standard file systems, a znode cannot be deleted if it has any children.

The main differences between zookeeper and standard file systems are that every znode can have data associated with it (every file can also be a directory and vice-versa) and znodes are limited to the amount of data that they can have. zookeeper was designed to store Coordination Data: status information, configuration, location information, etc. this kind of meta-information is usually measured in kilobytes, if not bytes. zookeeper has a built-in sanity check of 1 m, to prevent it from being used as a large data store, but in general it is used to store much smaller pieces of data.


The Service itself is replicated over a set of machines that comprise the service. these machines maintain an in-memory image of the data tree along with a transaction logs and snapshots in a persistent store. because the data is kept in-memory, Zookeeper is able to get very high throughput and low latency numbers. the downside to an in-memory database is that the size of the database that zookeeper can manage is limited by memory. this limitation is further reason to keep the amount of data stored in znodes small.

The servers that make up the zookeeper service must all know about each other. as long as a majority of the servers are available, the zookeeper service will be available. clients must also know the list of servers. the clients create a handle to the zookeeper service using this list of servers.

Clients only connect to a single zookeeper server. the client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heartbeats. if the TCP connection to the server breaks, the client will connect to a different server. when a client first connects to the zookeeper service, the first zookeeper server will setup a session for the client. if the client needs to connect to another server, this session will get reestablished with the new server.

Read requests sent by a zookeeper client are processed locally at the zookeeper server to which the client is connected. if the Read Request registers a watch on a znode, that watch is also tracked locally at the zookeeper server. write requests are forwarded to other zookeeper servers and go through consensus before a response is generated. sync requests are also forwarded to another server, but do not actually go through consensus. thus, the throughput of read requests scales with the number of servers and the throughput of write requests decreases with the number of servers.

Order is very important to zookeeper; almost bordering on obsessive-compulsive disorder. all updates are totally ordered. zookeeper actually stamps each update with a number that reflects this order. we call this number the zxid (zookeeper transaction ID ). each update will have a unique zxid. reads (and watches) are ordered with respect to updates. read responses will be stamped with the last zxid processed by the server that services the read.




How zookeeper works

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.