In-depth understanding of zookeeper fundamentals and installation Deployment

Last Update:2016-04-22 Source: Internet

Author: User

Tags zookeeper client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. What is ZooKeeper?

ZooKeeper is a reliable coordination system for large-scale distributed systems ; it provides features such as configuration maintenance, name services, distributed synchronization, group services, etc., and its goal is to encapsulate complex and error-prone critical services, Deliver easy-to-use interfaces and performance-efficient, robust systems to users.

2. Zookeeper features

ZooKeeper mainly includes the following features:

1), final consistency : show the same view for the client, which is the most important performance of ZooKeeper.

2), Reliability: If the message is accepted by a server, it will be accepted by all servers.

3), real-time: ZooKeeper cannot guarantee that two clients get the newly updated data at the same time, if you need the latest data, you should call the sync () interface before reading the data.

4), wait unrelated (Wait-free): Slow or invalid client does not intervene in fast client requests.

5), atomicity: Updates can only succeed or fail, no intermediate other states.

6), sequential: For all servers, the same message is published in a consistent order.

3. ZooKeeper Fundamentals

ZooKeeper Architecture
Let's take a look at the architecture diagram of ZooKeeper first.
　
In view of the above ZooKeeper frame composition, we need to master the following points.
1), each server stores a copy of the data in memory.
2), ZooKeeper Start, will be elected from the example of a leader (according to the Paxos agreement to elect, you know there is such an agreement can be).
3), Leader is responsible for handling data updates and other operations (here to use the Zab protocol, you know there is such an agreement can be)
4), a successful update operation flag is when and only if most servers successfully modify data in memory.

Zookeeper role

There are three main categories of roles in Zookeeper, as shown in the following table:

role	Description
Leader (leader)	The leader is responsible for initiating and resolution the voting and updating the system status.
Learner (learner) or follower (Follower)	The Follower is used to receive customer requests and return results to the client and participate in voting during the main selection process .
Viewer (ObServer)	ObServer can receive client connections and forward write requests to the leader node. But Observer does not participate in the voting process, only synchronizes the leader state. The purpose of the ObServer is to extend the system and improve the reading speed.
Clients (client)	The application client, requesting the initiator.

Thinking: 1, why do I need server?
①zookeeper need to ensure high availability and strong consistency.
② to support more clients, more servers need to be added.
③follower increase will cause the polling phase to increase in latency and affect performance.
2. What role does the server play in zookeeper?
①observer not participate in the voting process, only synchronize the status of leader
②observers accepts the client connection and forwards the write request to the leader node
③ joins more observer nodes to increase scalability without compromising throughput.
3, why the number of servers in the zookeeper is generally odd?
We know that the Paxos protocol is used in the Leader election algorithm in zookeeper. Paxos Core idea is that when most servers write successfully, the task data is written successfully.
① If you have 3 servers, you can allow up to 1 servers to hang up.
② If you have 4 servers, you can also allow up to 1 servers to hang up.
Since 3 or 4 servers also allow up to 1 servers to be hung, they are the same reliability, so select an odd number of zookeeper servers and select 3 servers.

ZooKeeper Write Data Flow
The flowchart for ZooKeeper writing data is shown below.

ZooKeeper's write data flow is mainly divided into the following steps:

1), such as the Client to ZooKeeper Server1 write data, send a write request.

2), if Server1 is not leader, then Server1 will further forward the received request to leader, because there is a zookeeper in each leader server. This leader will broadcast the write request to each server, such as Server1 and Server2, and the server will notify leader when it is written successfully.

3), when leader received most of the Server data written successfully, then the data is written successfully. If there are three nodes here, as long as there are two nodes to write the data successfully, then it is believed that the data is written successfully. After successful writing, leader will tell Server1 that the data was written successfully.

4), Server1 will further inform the Client data write success, then think the entire write operation success.

4. ZooKeeper Application Scenario Summary

Unified Naming Service
The naming structure diagram for the unified Naming service is shown below.

1, in the distributed environment, often need to the application/service for the unified naming, easy to identify different services.

1) similar to the domain name and the corresponding relationship between IP, IP is not easy to remember, and the domain name is easy to remember.

2) by name to obtain the resource or service address, provider and other information.

2. Organize the service/application name according to the hierarchy structure.

1) The Service name and address information can be written to zookeeper, and the client obtains the available Services list class through zookeeper.

Configuration Management

The configuration management structure diagram is shown below.

1, in a distributed environment, profile management and synchronization is a common problem.

1) in a cluster, the configuration information for all nodes is consistent, such as a Hadoop cluster.

2) After modifying the configuration file, you want to be able to quickly sync to each node.

2, configuration management can be implemented by zookeeper.
1) The configuration information can be written to a znode on zookeeper.

2) Each node listens to this znode.

3) Once the data in the Znode is modified, the zookeeper will notify the individual nodes.

Cluster Management
The cluster management structure diagram is shown below.

1, in a distributed environment, it is necessary to master the state of each node in real time.

1) According to the real-time status of the node to make some adjustments.

2, can be handed over to zookeeper implementation.

1) The node information can be written to a znode on zookeeper.

2) Listen to this znode to get its real-time status change.

3. Typical application

1) Master Status monitoring and election in HBase.

Distributed notification and coordination

1. In a distributed environment, there is often a service that needs to know the state of the Sub-service it manages.

1) Namenode need to know the status of each datanode.

2) Jobtracker need to know the status of each tasktracker.

2, heartbeat detection mechanism can be achieved through zookeeper.

3, information push can be realized by zookeeper, zookeeper is equivalent to a publish/subscribe system.

Distributed locks

Different services on different nodes, they may require sequential access to some resources, where a distributed lock is required. Distributed locks have the following characteristics:

1,zookeeper is strong and consistent. For example, one zookeeper client is running on each node, they create the same znode at the same time, but only one client is created successfully.

2, the realization of the exclusive lock. the client that created the Znode succeeds can get the lock, and the other clients can wait. After the current client runs out of this lock, the Znode is removed, and other clients try to create Znode to obtain a distributed lock.

3, control the timing of the lock. Each client creates a temporary znode under a znode, which must be createmode.ephemeral_sequential so that the Znode can master the global access timing.

Distributed queues

Distributed queues are divided into two types:

1. This queue is available when a member of a queue is NAND, otherwise it waits for all members to arrive, which is the synchronization queue.

1) A job consists of more than one task, and the job does not run until all tasks have been completed.

2) You can create a/job directory for the job, and then, in that directory, create a temporary znode for each completed task, which, once the number of temporary nodes reaches the total number of tasks, indicates that the job run is complete.

2, queue in a FIFO manner and team operations, such as the implementation of producer and consumer models.

5. ZooKeeper Installation Deployment

Distributed mode

1. Upload Zookeeper installation package

2. Unzip
TAR-ZXVF zookeeper-3.4.5.tar.gz-c/zookeeper/
3. Configuration (first configured on a single node)
3.1 Adding a zoo.cfg configuration file
Rename Zoo_sample.cfg to zoo.cfg under Extract directory/zookeeper/zookeeper-3.4.5/conf
　　 MV Zoo_sample.cfg Zoo.cfg

3.2 Modifying a configuration file (zoo.cfg)
　　

Datadir=/zookeeper/zookeeper-3.4.5/data
　　　　server.1=cs0:2888:3888
　　　　server.2=cs1:2888:3888
　　　　server.3=cs2:2888:3888

3.3 Create a myID file in (datadir=/zookeeper/zookeeper-3.4.5/data) with the contents of server. N in N (the content of SERVER.2 is 2)
echo "1" > myID

3.4 Copy the configured ZK to the other nodes

scp-r/zookeeper/zookeeper-3.4.5/cs1:/
scp-r/zookeeper/zookeeper-3.4.5/cs2:/

3.5 Note: Be sure to modify the contents of myID on other nodes

In CS1 should talk about myID content changed to 2 (echo "2" > myID)
In CS2 should talk about myID content changed to 3 (echo "3" > myID)

4. Start the cluster in the unzip directory
Start ZK separately
　　　　bin/zkserver.sh Start
View the status of Znode
　　　　bin/zkserver.sh status

In-depth understanding of zookeeper fundamentals and installation Deployment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More