Structure of this article:
A total of 10 series
Zookeeper Series One: Zookeeper introduction
Zookeeper Series II: Zookeeper Data Model, namespace, and node concepts
Zookeeper Series III: Installation of Zookeeper
Zookeeper Series IV: Zookeeper Configuration
Zookeeper series of the five: zookeeper operation
******************
--------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------
Zookeeper Series One: Zookeeper introduction
Zookeeper is a distributed, open source coordination service designed to distribute applications. Distributed applications can be built on a higher level of implementation of services such as synchronization, configuration management, grouping, and naming. Zookeeper is intended to design an environment that is easily programmable, and its file system uses the directory tree structure that we are familiar with. Zookeeper is written using Java, but supports Java and C two programming languages.
As we all know, coordinating services can be very error-prone, but it is difficult to get back to normal, for example, the coordination service is so easy to be in a state of deadlock. The purpose of our design zookeeper is to mitigate the coordination tasks undertaken by distributed applications.
Zookeeper Series II: Zookeeper Data Model, namespace, and node concepts
Zookeeper data model and hierarchy namespaces
The namespaces provided are very similar to standard file systems. A name is made up of a sequence of path names separated by a slash. Each node in the zookeeper is identified by a path.
The following figure is the data model of the nodes in zookeeper, the tree-like structure is easy to operate and easy to understand.
Figure: Zookeeper Hierarchy namespace
nodes and temporary nodes in zookeeper
Zookeeper nodes are maintained by a tree-like structure, and each node is marked and accessed by path. In addition, each node also has some of its own information, including: data, data length, creation time, modification time, and so on. From the characteristics of such a class of nodes which contain both data and the path table, it can be seen that the Zookeeper node can be regarded as a file and can be regarded as a directory, which has the characteristics of both. For ease of expression, we will use Znode to represent the zookeeper nodes discussed in the future.
Specifically, Znode maintains data structures such as data, ACLs (Access control list, access controls lists), timestamp, and so on, which are managed by the cache to take effect and to coordinate updates. The version number that it maintains when the data in the Znode is updated is very similar to how the counter timestamp in the database operates.
In addition, Znode also has the characteristics of atomic operation: in the namespace, each Znode data will be read and written by the atom. The read operation will read all data related to the Znode, and the write operation will replace all the data. In addition, each node has an access control list that sets the permissions for user actions.
Temporary nodes also exist in zookeeper. These nodes exist at the same time as the session, and the temporary nodes are deleted when the sessions life cycle ends. Temporary nodes also play a very important role in some situations.
---------------------------
Zookeeper Series III: Installation of Zookeeper
The installation mode of zookeeper is divided into three kinds: stand-alone mode (stand-alone), cluster mode and cluster pseudo distribution mode. Zookeeper Single mode installation is relatively simple, if the first contact zookeeper, it is recommended to install Zookeeper stand-alone mode or cluster pseudo distribution mode.
1) stand-alone mode
First, download the latest stable version of a zookeeper from the Apache official website.
Http://hadoop.apache.org/zookeeper/releases.html
As a domestic user, it can save a lot of time to select the location of the nearest source file server.
http://labs.renren.com/apache-mirror//hadoop/zookeeper/
Zookeeper requires a Java environment to run and requires more than JAVA6 versions, which can be downloaded from the SUN official web site and set up for Java environment variables. In addition, in order to facilitate the operation in the future, we need to configure the zookeeper environment variables, as follows, add the following contents to the/etc/profile file:
#Set Zookeeper enviroment
Export zookeeper_home=/root/hadoop-0.20.2/zookeeper-3.3.1
Export path= $PATH: $ZOOKEEPER _home/bin: $ZOOKEEPER _home/conf
The zookeeper server is contained in a single JAR file, and installing this service requires the user to create a configuration document and set it up. We are in the Zookeeper-*.*.* directory (we take the latest version of the current zookeeper 3.3.1 as an example, so the following "Zookeeper-*.*.*" will be written as "ZooKeeper-3.3.1") under the Conf folder to create a zoo.c FG file, which contains the following contents:
ticktime=2000
Datadir=/var/zookeeper
clientport=2181
In this file, we need to specify the value of the DataDir, which points to a directory that needs to be empty at the beginning. The following are the meanings of each parameter:
Ticktime: The base event unit, in milliseconds. It is used to indicate a heartbeat, with a minimum session expiration of twice times ticktime. 。
DataDir: Stores the location of database snapshots in memory, and if no parameters are set, the update transaction log is stored in the default location.
ClientPort: Ports that listen for client connections
When using stand-alone mode, users need to be aware that there is no zookeeper replica in this configuration, so if the zookeeper server fails, the Zookeeper service will stop.
The following code listing A is the Zookeeper configuration document that we set up according to our own situation: Zoo.cfg
Code Listings A:ZOO.CFG
# The number of milliseconds of each tick
ticktime=2000
# The directory where the snapshot is stored.
Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/data
# The port at which the clients'll connect
clientport=2181
2) Cluster mode
In order to obtain a reliable zookeeper service, users should deploy zookeeper on a cluster. As long as most of the zookeeper services on the cluster are started, the total zookeeper service will be available. In addition, it is best to use odd-numbered machines. If the zookeeper has 5 machines, it can handle the failure of 2 machines.
After the operation is similar to the stand-alone mode installation, we also need to set up the JAVA environment, download the latest zookeeper stable version and configure the appropriate environment variables. The difference is that the parameter settings for the CONF/ZOO.CFG configuration file are set on each machine, and refer to the following configuration:
ticktime=2000
datadir=/var/zookeeper/
clientport=2181
Initlimit=5
synclimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
"Server.id=host:port:port." Indicates the identity of the different zookeeper servers, and the machines that are part of the cluster should know the other machines in the ensemble. Users can choose from the Server.id=host:port:port. To read the relevant information in the. Creates a file named myID in the directory of the server's data (datadir parameter), which contains only one row of content, and specifies its own ID value. For example, the server "1" should write "1" in the myID file. This ID value must be unique in the ensemble and is 1 to 255 in size. In this line of configuration, the first port is the port from which the (follower) machine is connected to the primary (leader) machine, and the second port is the port that is used for the leader election. In this example, each machine uses three ports, respectively: ClientPort, 2181 Port, 2888 port, 3888.
We tested the use of the Zookeeper service on a Hadoop cluster with three machines, and the following code listing B is the zookeeper configuration document that we set up according to our situation:
Code Listings B:ZOO.CFG
# The number of milliseconds of each tick
ticktime=2000
# The number of ticks that the initial
# Synchronization phase can take
initlimit=10
# The number of ticks that can pass between
# Sending a request and getting an acknowledgement
Synclimit=5
# The directory where the snapshot is stored.
Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/d1
# The port at which the clients'll connect
clientport=2181
server.1=ip1:2887:3887
server.2=ip2:2888:3888
server.3=ip3:2889:3889
The IP address of the distributed zookeeper is configured in the list of IPs respectively. Of course, zookeeper can also be accessed through the machine name, but it needs to be set up in the Ubuntu hosts environment. Readers can refer to Ubuntu and Linux for setting up the relevant data.
3) pseudo distribution of clusters
In short, the cluster pseudo distribution mode is the Zookeeper service which simulates the cluster under single machine.
Then, how to configure the zookeeper of the cluster pseudo distribution pattern. In fact, in the Zookeeper configuration document, the ClientPort parameter is used to set the port that the client connects to zookeeper. In server.1=ip1:2887:3887, IP1 indicates the machine IP address that makes up the zookeeper service, 2887 is the port that is used for the leader election, and 3887 is the port that communicates between the machines that make up the zookeeper service. Cluster pseudo distribution model we use each configuration document to simulate a machine, that is, to run multiple zookeeper instances on a single machine. However, we have to ensure that the clientport of each configuration document does not conflict.
The following is the cluster pseudo distribution pattern we have configured, through Zoo1.cfg, Zoo2.cfg, zoo3.cfg simulates the zookeeper cluster of the three machines. See Code Listing C:
Code Listing C:ZOO1.CFG:
# The number of milliseconds of each tick
ticktime=2000
# The number of ticks that the initial
# Synchronization phase can take
initlimit=10
# The number of ticks that can pass between
# Sending a request and getting an acknowledgement
Synclimit=5
# The directory where the snapshot is stored.
Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_1
# The port at which the clients'll connect
clientport=2181
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
Zoo2.cfg:
# The number of milliseconds of each tick
ticktime=2000
# The number of ticks that the initial
# Synchronization phase can take
initlimit=10
# The number of ticks that can pass between
# Sending a request and getting an acknowledgement
Synclimit=5
# The directory where the snapshot is stored.
Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2
# The port at which the clients'll connect
clientport=2182
#the location of the log file
Datalogdir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
Zoo3.cfg:
# The number of milliseconds of each tick
ticktime=2000
# The number of ticks that the initial
# Synchronization phase can take
initlimit=10
# The number of ticks that can pass between
# Sending a request and getting an acknowledgement
Synclimit=5
# The directory where the snapshot is stored.
Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2
# The port at which the clients'll connect
clientport=2183
#the location of the log file
Datalogdir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
As you can see from the three code listings above, DataDir is different, except for ClientPort. Also, do not forget to create a myID file in the directory corresponding to DataDir to specify the corresponding zookeeper server instance.
Here zookeeper installation has been said, the next section we talk about the zookeeper parameters of the understanding of the configuration.
-------------------------------------------------
Zookeeper Series IV: Zookeeper Configuration
The functional features of zookeeper are controlled by zookeeper configuration files (zoo.cfg configuration files). Zookeeper such a design actually has its own reason. It can be seen from the previous configuration of the zookeeper that when the zookeeper cluster is configured, its configuration documents are identical (only a few are different for the cluster pseudo distribution mode). Such a configuration makes it very convenient to deploy the zookeeper service. Also, if the server uses a different profile, you must make sure that the list of servers in the different profiles matches.
Some parameters are optional when you set up the Zookeeper configuration document, but some parameters are required. These required parameters form the minimum configuration requirements for the Zookeeper configuration document.
The following are the parameters that must be configured in the minimum configuration requirements:
1) Minimum Configuration
ClientPort
Monitor the port on which the client is connected;
DataDir
The location of the storage database snapshot in memory;
Note You should carefully select where the log is stored, and using a dedicated log storage device can greatly improve the performance of your system, and if you store the logs on a more busy storage device, you will have a significant impact on the performance of your system.
Ticktime
The base event unit, in milliseconds. It is used to control heartbeat and timeout, by default the minimum session timeout time is twice times ticktime.
2) Advanced Configuration
The following are optional configuration parameters in the Advanced configuration requirements, and the user can use the following parameters to better specify the behavior of the zookeeper:
Datalogdir
This operation will write the transaction log to the directory specified by "Datalogdir" instead of the directory specified by "DataDir" by the management machine. This will allow the use of a dedicated log device and help us avoid competition between logs and snapshots. The configuration is as follows:
#the location of the log file
Datalogdir=/root/hadoop-0.20.2/zookeeper-3.3.1/log/data_log
Maxclientcnxns
This operation will limit the number of clients connected to zookeeper, limit the number of concurrent connections, and differentiate between clients by IP. This configuration option can be used to block certain categories of Dos attacks. Setting it to 0 or ignoring it without setting will remove the restriction on concurrent connections.
For example, at this point we set the value of Maxclientcnxns to 1, as follows:
#set Maxclientcnxns
Maxclientcnxns=1
After you start zookeeper, you first connect to the zookeeper server with a client. Then, when the second client attempts to connect to the zookeeper, or some implicit connection operation to the client, it triggers the zookeeper configuration. The system prompts for relevant information, as shown in Figure 1 below:
Figure 1:zookeeper Maxclientcnxns exception
Minsessiontimeout and Maxsessiontimeout
The minimum session timeout and the maximum session timeout time. Where the minimum session timeout is twice times the ticktme time by default, and the maximum session timeout is 20 times times the session timeout by default. At boot time, the system displays the appropriate information, as shown in Figure 2 below, the default session timeout:
Figure 2: Default Session Timeout
Reading from the above, minsessiontimeout and maxsessiontimeout values are all-1, now we set the system's minimum session timeout and the maximum session timeout, as follows:
#set Minsessiontimeout
minsessiontimeout=1000
#set Maxsessiontimeout
maxsessiontimeout=10000
When configuring Minsessiontmeout and Maxsessiontimeout values, it is important to note that if you set this value too small, the session may have just been established and will have to exit because of a timeout. In general, you cannot set this value to be smaller than the value of Ticktime.
3) Cluster configuration
Initlimit
This configuration indicates that the follower (as opposed to the leader "client") is allowed to connect and synchronize to the leader initialization connection time, which is expressed in multiples of ticktime. The connection fails when the ticktime time of the set multiplier is exceeded.
Synclimit
This configuration represents the length of time that messages, requests, and replies are sent between leader and follower. If follower cannot communicate with leader within the set time, this follower will be discarded. ---------------------
Zookeeper series of the five: zookeeper operation
We are here to introduce the corresponding zookeeper series of the three: Zookeeper installation mode of operation.
1) Stand-alone mode
Users can start the zookeeper service by using the following command:
zkserver.sh start
This command executes the ZOO.CFG configuration file under Zookeeper's Conf folder by default. When running successfully, users will see a prompt interface similar to the following:
root@ubuntu:~# zkserver.sh Start
JMX enabled by default
Using config:/root/hadoop-0.20.2/zookeeper-3.3.1/bin/. /conf/zoo.cfg
Starting zookeeper ...
Started
... ...
2011-01-19 10:04:42,300-warn [main:quorumpeermain@105]-either no config or no quorum defined in config, running in s Tandalone mode
... ...
2011-01-19 10:04:42,419-info [main:zookeeperserver@660]-Ticktime set to 2000
2011-01-19 10:04:42,419-info [main:zookeeperserver@669]-Minsessiontimeout set to-1
2011-01-19 10:04:42,419-info [main:zookeeperserver@678]-Maxsessiontimeout set to-1
2011-01-19 10:04:42,560-info [main:nioservercnxn$factory@143]-binding to port 0.0.0.0/0.0.0.0:2181
2011-01-19 10:04:42,806-info [main:filesnap@82]-Reading snapshot/root/hadoop-0.20.2/zookeeper-3.3.1/data/ version-2/snapshot.200000036
2011-01-19 10:04:42,927-info [main:filesnap@82]-Reading snapshot/root/hadoop-0.20.2/zookeeper-3.3.1/data/ version-2/snapshot.200000036
2011-01-19 10:04:42,950-info [main:filetxnsnaplog@208]-snapshotting:400000058
As you can see from the above, the system will list the relevant environment configuration information for zookeeper operation after the operation is successful.
2) Cluster mode
Cluster mode requires the user to run the first part of the command on each zookeeper machine, and no longer repeat here.
3) cluster pseudo distribution pattern
In cluster pseudo distribution mode, we have only one machine, but we want to run three zookeeper service instances. At this point, it will not work if you use the above command again. Here, we have three lives to run the Zookeeper series three: Zookeeper installation of the Zookeeper service we configured. as follows:
zkserver.sh Start Zoo1.cfg
zkserver.sh Start Zoo2.cfg
zkserver.sh Start Zoo3.cfg
After you run the first command, the reader will find some system error prompts, as shown in Figure 1 :