1 Overview
The Zookeeper distributed service framework is a sub-project of Apache Hadoop. It is mainly used to solve some data management problems frequently encountered in distributed applications, such: unified Naming Service, status Synchronization Service, cluster management, and management of distributed application configuration items. ZooKeeper can be installed and run in Standalone mode. However, ZooKeeper ensures the stability and availability of ZooKeeper clusters through distributed ZooKeeper clusters (one Leader and multiple Follower) based on certain policies, in this way, the reliability of distributed applications is achieved. Zookeeper maintains a hierarchical data structure, which is very similar to a standard file system, as shown in
The Zookeeper data structure has the following features:
Each sub-directory item, such as NameService, is called znode. This znode is uniquely identified by its path. For example, the znode of Server1 is/NameService/Server1.
Znode can have sub-node directories and each znode can store data. Note that directory nodes of the EPHEMERAL type cannot have sub-node directories.
Znode has versions. Data stored in each znode can have multiple versions, that is, multiple copies of data can be stored in one access path.
Znode can be a temporary node. Once the client that creates this znode loses contact with the server, this znode will also be automatically deleted. The client of Zookeeper communicates with the server through a persistent connection, each client and server are connected by heartbeat. The connection status is called session. If znode is a temporary node and the session becomes invalid, znode is deleted.
The directory name of znode can be automatically numbered. If App1 already exists and is created, it will be automatically named App2.
Znode can be monitored, including the modification of data stored in this directory node and the change of the sub-node directory. Once changed, the monitoring client can be notified. This is the core feature of Zookeeper, many functions of Zookeeper are implemented based on this feature. examples will be provided in the following typical application scenarios.
2. Environment deployment
The deployment of the Zookeeper cluster is based on the Hadoop cluster deployed in the previous article. The cluster configuration is as follows:
Zookeeper1 rango 192.168.56.1
Zookeeper2 vm2 192.168.56.102
Zookeeper3 vm3 192.168.56.103
Zookeeper4 vm4 192.168.56.104
Zookeeper5 vm1 192.168.56.101
3. installation and configuration
3.1 download and install Zookeeper
Download the latest Zookeeper version from the Apache official website, decompress it to the/usr directory, and rename it zookeeper:
Tar zxvf zookeeper-3.4.5.tar.gz; mv zookeeper-3.4.5/usr/zookeeper
Set the zookeeper directory owner to hadoop:
Chown-R hadoop: hadoop/usr/zookeeper
Ps: You can first install and configure on the master machine, and then copy the scp command to other nodes in the Cluster:
Scp-R/usr/zookeeper node ip:/usr
3.2 configure Zookeeper
3.2.1 create a data directory
Run:
Mkdir/var/lib/zookeeper
3.2.2 configure Environment Variables
Vim/etc/profile:
# Set zookeeper path
Export ZOOKEEPER_HOME =/usr/zookeeper
Export PATH = $ PATH: $ ZOOKEEPER_HOME/bin
3.2.3 configure the Zookeeper Cluster
Cp/usr/zookeeper/conf/zoo_sample.cfg zoo. cfg
Vim zoo. cfg:
# The number of milliseconds of each tick
TickTime = 2000
# The number of ticks that the initial
# Synchronization phase can take
InitLimit = 10
# The number of ticks that can pass
# Sending a request and getting an acknowledgement
SyncLimit = 5
# The directory where the snapshot is stored.
# Do not use/tmp for storage,/tmp here is just
# Example sakes.
DataDir =/var/lib/zookeeper
# The port at which the clients will connect
ClientPort = 2181
#
# Be sure to read the maintenance section of
# Administrator guide before turning on autopurge.
#
# Http://zookeeper.apache.org/doc/current/zookeeperAdmin.html# SC _maintenance
#
# The number of snapshots to retain in dataDir
# Autopurge. snapRetainCount = 3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
# Autopurge. purgeInterval = 1
Server 1 = 192.168.56.1: 2888: 3888
Server .2 = 192.168.56.102: 2888: 3888
Server.3 = 192.168.56.103: 2888: 3888
Server.4 = 192.168.56.104: 2888: 3888
Server.5 = 192.168.56.101: 2888: 3888
Note:
TickTime: interval of sending heartbeat, in milliseconds
Initlimit and sysncLimit: both are measured based on the total number of ticktime (the above time is 10*2000 = 20 s ). The initLimit parameter sets the time for all followers to connect and synchronize with the leader. If more than half of the followers fail to complete synchronization within the set time period, the leader will announce that the leader will give up his leadership position, then another leader election will be conducted. If this happens frequently, you can view the records in the log and find that the set value is too small.
The syscLimit parameter sets the time for a follower to synchronize with the leader. If a follower fails to complete synchronization within the set time, it will restart itself, and all clients associated with the follower will connect to another follower.
DataDir: the persistent data in the stored zookeeperk. There are two types of data in zk, one is to disappear after use, the other is to exist permanently, and the log of zk is also saved here.
Server. A = B: C: D: where A is A number, indicating the number of the server. B is the ip address of the server; C Indicates the port on which the server exchanges information with the Leader server in the cluster. D indicates that if the Leader server in the cluster fails, a port is required for re-election, select a new Leader, which is the port used for communication between servers during the election. For the pseudo cluster configuration method, because B is the same, different Zookeeper instance communication port numbers cannot be the same, so you need to assign them different port numbers.
Create a myid file in the data directory of each server. The file content is the id in the above corresponding server. id:
Echo id>/var/lib/zookeeper/myid
3.3 start and stop the Zookeeper Service
Start Zookeeper: zkServer. sh start on all nodes in the Cluster
[Root @ rango ~] # ZkServer. sh start
JMX enabled by default
Using config:/usr/zookeeper/bin/../conf/zoo. cfg
Starting zookeeper... STARTED
View: zkserver. sh starus:
[Root @ rango ~] # ZkServer. sh status
JMX enabled by default
Using config:/usr/zookeeper/bin/../conf/zoo. cfg
Mode: follower
Ps: Disable iptables (Intranet) before starting)
ZooKeeper details: click here
ZooKeeper: click here
ZooKeeper cluster configuration
Use ZooKeeper to implement distributed shared locks
Distributed service framework ZooKeeper-manage data in a distributed environment
Build a ZooKeeper Cluster Environment
Test Environment configuration of ZooKeeper server cluster
ZooKeeper cluster Installation