1. What is ZooKeeper?
ZooKeeper is a reliable coordination system for large-scale distributed systems ; it provides features such as configuration maintenance, name services, distributed synchronization, group services, etc., and its goal is to encapsulate complex and error-prone critical services, Deliver easy-to-use interfaces and performance-efficient, robust systems to users.
2. Zookeeper features
ZooKeeper mainly includes the following features:
1), final consistency : show the same view for the client, which is the most important performance of ZooKeeper.
2), Reliability: If the message is accepted by a server, it will be accepted by all servers.
3), real-time: ZooKeeper cannot guarantee that two clients get the newly updated data at the same time, if you need the latest data, you should call the sync () interface before reading the data.
4), wait unrelated (Wait-free): Slow or invalid client does not intervene in fast client requests.
5), atomicity: Updates can only succeed or fail, no intermediate other states.
6), sequential: For all servers, the same message is published in a consistent order.
3. ZooKeeper Fundamentals
ZooKeeper Architecture
Let's take a look at the architecture diagram of ZooKeeper first.
In view of the above ZooKeeper frame composition, we need to master the following points.
1), each server stores a copy of the data in memory.
2), ZooKeeper Start, will be elected from the example of a leader (according to the Paxos agreement to elect, you know there is such an agreement can be).
3), Leader is responsible for handling data updates and other operations (here to use the Zab protocol, you know there is such an agreement can be)
4), a successful update operation flag is when and only if most servers successfully modify data in memory.
Zookeeper role
There are three main categories of roles in Zookeeper, as shown in the following table:
role |
Description |
Leader (leader) |
The leader is responsible for initiating and resolution the voting and updating the system status. |
Learner (learner) or follower (Follower) |
The Follower is used to receive customer requests and return results to the client and participate in voting during the main selection process . |
Viewer (ObServer) |
ObServer can receive client connections and forward write requests to the leader node. But Observer does not participate in the voting process, only synchronizes the leader state. The purpose of the ObServer is to extend the system and improve the reading speed. |
Clients (client) |
The application client, requesting the initiator. |
Thinking: 1, why do I need server?
①zookeeper need to ensure high availability and strong consistency.
② to support more clients, more servers need to be added.
③follower increase will cause the polling phase to increase in latency and affect performance.
2. What role does the server play in zookeeper?
①observer not participate in the voting process, only synchronize the status of leader
②observers accepts the client connection and forwards the write request to the leader node
③ joins more observer nodes to increase scalability without compromising throughput.
3, why the number of servers in the zookeeper is generally odd?
We know that the Paxos protocol is used in the Leader election algorithm in zookeeper. Paxos Core idea is that when most servers write successfully, the task data is written successfully.
① If you have 3 servers, you can allow up to 1 servers to hang up.
② If you have 4 servers, you can also allow up to 1 servers to hang up.
Since 3 or 4 servers also allow up to 1 servers to be hung, they are the same reliability, so select an odd number of zookeeper servers and select 3 servers.
ZooKeeper Write Data Flow
The flowchart for ZooKeeper writing data is shown below.
ZooKeeper's write data flow is mainly divided into the following steps:
1), such as the Client to ZooKeeper Server1 write data, send a write request.
2), if Server1 is not leader, then Server1 will further forward the received request to leader, because there is a zookeeper in each leader server. This leader will broadcast the write request to each server, such as Server1 and Server2, and the server will notify leader when it is written successfully.
3), when leader received most of the Server data written successfully, then the data is written successfully. If there are three nodes here, as long as there are two nodes to write the data successfully, then it is believed that the data is written successfully. After successful writing, leader will tell Server1 that the data was written successfully.
4), Server1 will further inform the Client data write success, then think the entire write operation success.
4. ZooKeeper Application Scenario Summary
1, in the distributed environment, often need to the application/service for the unified naming, easy to identify different services.
1) similar to the domain name and the corresponding relationship between IP, IP is not easy to remember, and the domain name is easy to remember.
2) by name to obtain the resource or service address, provider and other information.
2. Organize the service/application name according to the hierarchy structure.
1) The Service name and address information can be written to zookeeper, and the client obtains the available Services list class through zookeeper.
The configuration management structure diagram is shown below.
1, in a distributed environment, profile management and synchronization is a common problem.
1) in a cluster, the configuration information for all nodes is consistent, such as a Hadoop cluster.
2) After modifying the configuration file, you want to be able to quickly sync to each node.
2, configuration management can be implemented by zookeeper.
1) The configuration information can be written to a znode on zookeeper.
2) Each node listens to this znode.
3) Once the data in the Znode is modified, the zookeeper will notify the individual nodes.
1, in a distributed environment, it is necessary to master the state of each node in real time.
1) According to the real-time status of the node to make some adjustments.
2, can be handed over to zookeeper implementation.
1) The node information can be written to a znode on zookeeper.
2) Listen to this znode to get its real-time status change.
3. Typical application
1) Master Status monitoring and election in HBase.
- Distributed notification and coordination
1. In a distributed environment, there is often a service that needs to know the state of the Sub-service it manages.
1) Namenode need to know the status of each datanode.
2) Jobtracker need to know the status of each tasktracker.
2, heartbeat detection mechanism can be achieved through zookeeper.
3, information push can be realized by zookeeper, zookeeper is equivalent to a publish/subscribe system.
Different services on different nodes, they may require sequential access to some resources, where a distributed lock is required. Distributed locks have the following characteristics:
1,zookeeper is strong and consistent. For example, one zookeeper client is running on each node, they create the same znode at the same time, but only one client is created successfully.
2, the realization of the exclusive lock. the client that created the Znode succeeds can get the lock, and the other clients can wait. After the current client runs out of this lock, the Znode is removed, and other clients try to create Znode to obtain a distributed lock.
3, control the timing of the lock. Each client creates a temporary znode under a znode, which must be createmode.ephemeral_sequential so that the Znode can master the global access timing.
Distributed queues are divided into two types:
1. This queue is available when a member of a queue is NAND, otherwise it waits for all members to arrive, which is the synchronization queue.
1) A job consists of more than one task, and the job does not run until all tasks have been completed.
2) You can create a/job directory for the job, and then, in that directory, create a temporary znode for each completed task, which, once the number of temporary nodes reaches the total number of tasks, indicates that the job run is complete.
2, queue in a FIFO manner and team operations, such as the implementation of producer and consumer models.
5. ZooKeeper Installation Deployment
1. Upload Zookeeper installation package
2. Unzip
TAR-ZXVF zookeeper-3.4.5.tar.gz-c/zookeeper/
3. Configuration (first configured on a single node)
3.1 Adding a zoo.cfg configuration file
Rename Zoo_sample.cfg to zoo.cfg under Extract directory/zookeeper/zookeeper-3.4.5/conf
MV Zoo_sample.cfg Zoo.cfg
3.2 Modifying a configuration file (zoo.cfg)
Datadir=/zookeeper/zookeeper-3.4.5/data
server.1=cs0:2888:3888
server.2=cs1:2888:3888
server.3=cs2:2888:3888
3.3 Create a myID file in (datadir=/zookeeper/zookeeper-3.4.5/data) with the contents of server. N in N (the content of SERVER.2 is 2)
echo "1" > myID
3.4 Copy the configured ZK to the other nodes
scp-r/zookeeper/zookeeper-3.4.5/cs1:/
scp-r/zookeeper/zookeeper-3.4.5/cs2:/
3.5 Note: Be sure to modify the contents of myID on other nodes
In CS1 should talk about myID content changed to 2 (echo "2" > myID)
In CS2 should talk about myID content changed to 3 (echo "3" > myID)
4. Start the cluster in the unzip directory
Start ZK separately
bin/zkserver.sh Start
View the status of Znode
bin/zkserver.sh status
In-depth understanding of zookeeper fundamentals and installation Deployment