Zookeeper Set Series (this set is very full, also very detailed)

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Structure of this article:

A total of 10 series

Zookeeper Series One: Zookeeper introduction

Zookeeper Series II: Zookeeper Data Model, namespace, and node concepts

Zookeeper Series III: Installation of Zookeeper

Zookeeper Series IV: Zookeeper Configuration

Zookeeper series of the five: zookeeper operation

******************

--------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------

Zookeeper Series One: Zookeeper introduction

Zookeeper is a distributed, open source coordination service designed to distribute applications. Distributed applications can be built on a higher level of implementation of services such as synchronization, configuration management, grouping, and naming. Zookeeper is intended to design an environment that is easily programmable, and its file system uses the directory tree structure that we are familiar with. Zookeeper is written using Java, but supports Java and C two programming languages.

As we all know, coordinating services can be very error-prone, but it is difficult to get back to normal, for example, the coordination service is so easy to be in a state of deadlock. The purpose of our design zookeeper is to mitigate the coordination tasks undertaken by distributed applications.

Zookeeper Series II: Zookeeper Data Model, namespace, and node concepts

Zookeeper data model and hierarchy namespaces

The namespaces provided are very similar to standard file systems. A name is made up of a sequence of path names separated by a slash. Each node in the zookeeper is identified by a path.

The following figure is the data model of the nodes in zookeeper, the tree-like structure is easy to operate and easy to understand.

Figure: Zookeeper Hierarchy namespace

nodes and temporary nodes in zookeeper

Zookeeper nodes are maintained by a tree-like structure, and each node is marked and accessed by path. In addition, each node also has some of its own information, including: data, data length, creation time, modification time, and so on. From the characteristics of such a class of nodes which contain both data and the path table, it can be seen that the Zookeeper node can be regarded as a file and can be regarded as a directory, which has the characteristics of both. For ease of expression, we will use Znode to represent the zookeeper nodes discussed in the future.

Specifically, Znode maintains data structures such as data, ACLs (Access control list, access controls lists), timestamp, and so on, which are managed by the cache to take effect and to coordinate updates. The version number that it maintains when the data in the Znode is updated is very similar to how the counter timestamp in the database operates.

In addition, Znode also has the characteristics of atomic operation: in the namespace, each Znode data will be read and written by the atom. The read operation will read all data related to the Znode, and the write operation will replace all the data. In addition, each node has an access control list that sets the permissions for user actions.

Temporary nodes also exist in zookeeper. These nodes exist at the same time as the session, and the temporary nodes are deleted when the sessions life cycle ends. Temporary nodes also play a very important role in some situations.

---------------------------

Zookeeper Series III: Installation of Zookeeper

The installation mode of zookeeper is divided into three kinds: stand-alone mode (stand-alone), cluster mode and cluster pseudo distribution mode. Zookeeper Single mode installation is relatively simple, if the first contact zookeeper, it is recommended to install Zookeeper stand-alone mode or cluster pseudo distribution mode.

1) stand-alone mode

First, download the latest stable version of a zookeeper from the Apache official website.

Http://hadoop.apache.org/zookeeper/releases.html

As a domestic user, it can save a lot of time to select the location of the nearest source file server.

http://labs.renren.com/apache-mirror//hadoop/zookeeper/

Zookeeper requires a Java environment to run and requires more than JAVA6 versions, which can be downloaded from the SUN official web site and set up for Java environment variables. In addition, in order to facilitate the operation in the future, we need to configure the zookeeper environment variables, as follows, add the following contents to the/etc/profile file:

#Set Zookeeper enviroment

Export zookeeper_home=/root/hadoop-0.20.2/zookeeper-3.3.1

Export path= $PATH: $ZOOKEEPER _home/bin: $ZOOKEEPER _home/conf

The zookeeper server is contained in a single JAR file, and installing this service requires the user to create a configuration document and set it up. We are in the Zookeeper-*.*.* directory (we take the latest version of the current zookeeper 3.3.1 as an example, so the following "Zookeeper-*.*.*" will be written as "ZooKeeper-3.3.1") under the Conf folder to create a zoo.c FG file, which contains the following contents:

ticktime=2000

Datadir=/var/zookeeper

clientport=2181

In this file, we need to specify the value of the DataDir, which points to a directory that needs to be empty at the beginning. The following are the meanings of each parameter:

Ticktime: The base event unit, in milliseconds. It is used to indicate a heartbeat, with a minimum session expiration of twice times ticktime. 。

DataDir: Stores the location of database snapshots in memory, and if no parameters are set, the update transaction log is stored in the default location.

ClientPort: Ports that listen for client connections

When using stand-alone mode, users need to be aware that there is no zookeeper replica in this configuration, so if the zookeeper server fails, the Zookeeper service will stop.

The following code listing A is the Zookeeper configuration document that we set up according to our own situation: Zoo.cfg

Code Listings A:ZOO.CFG

# The number of milliseconds of each tick

ticktime=2000

# The directory where the snapshot is stored.

Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/data

# The port at which the clients'll connect

clientport=2181

2) Cluster mode

In order to obtain a reliable zookeeper service, users should deploy zookeeper on a cluster. As long as most of the zookeeper services on the cluster are started, the total zookeeper service will be available. In addition, it is best to use odd-numbered machines. If the zookeeper has 5 machines, it can handle the failure of 2 machines.

After the operation is similar to the stand-alone mode installation, we also need to set up the JAVA environment, download the latest zookeeper stable version and configure the appropriate environment variables. The difference is that the parameter settings for the CONF/ZOO.CFG configuration file are set on each machine, and refer to the following configuration:

ticktime=2000

datadir=/var/zookeeper/

clientport=2181

Initlimit=5

synclimit=2

server.1=zoo1:2888:3888

server.2=zoo2:2888:3888

server.3=zoo3:2888:3888

"Server.id=host:port:port." Indicates the identity of the different zookeeper servers, and the machines that are part of the cluster should know the other machines in the ensemble. Users can choose from the Server.id=host:port:port. To read the relevant information in the. Creates a file named myID in the directory of the server's data (datadir parameter), which contains only one row of content, and specifies its own ID value. For example, the server "1" should write "1" in the myID file. This ID value must be unique in the ensemble and is 1 to 255 in size. In this line of configuration, the first port is the port from which the (follower) machine is connected to the primary (leader) machine, and the second port is the port that is used for the leader election. In this example, each machine uses three ports, respectively: ClientPort, 2181 Port, 2888 port, 3888.

We tested the use of the Zookeeper service on a Hadoop cluster with three machines, and the following code listing B is the zookeeper configuration document that we set up according to our situation:

Code Listings B:ZOO.CFG

# The number of milliseconds of each tick

ticktime=2000

# The number of ticks that the initial

# Synchronization phase can take

initlimit=10

# The number of ticks that can pass between

# Sending a request and getting an acknowledgement

Synclimit=5

# The directory where the snapshot is stored.

Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/d1

# The port at which the clients'll connect

clientport=2181

server.1=ip1:2887:3887

server.2=ip2:2888:3888

server.3=ip3:2889:3889

The IP address of the distributed zookeeper is configured in the list of IPs respectively. Of course, zookeeper can also be accessed through the machine name, but it needs to be set up in the Ubuntu hosts environment. Readers can refer to Ubuntu and Linux for setting up the relevant data.

3) pseudo distribution of clusters

In short, the cluster pseudo distribution mode is the Zookeeper service which simulates the cluster under single machine.

Then, how to configure the zookeeper of the cluster pseudo distribution pattern. In fact, in the Zookeeper configuration document, the ClientPort parameter is used to set the port that the client connects to zookeeper. In server.1=ip1:2887:3887, IP1 indicates the machine IP address that makes up the zookeeper service, 2887 is the port that is used for the leader election, and 3887 is the port that communicates between the machines that make up the zookeeper service. Cluster pseudo distribution model we use each configuration document to simulate a machine, that is, to run multiple zookeeper instances on a single machine. However, we have to ensure that the clientport of each configuration document does not conflict.

The following is the cluster pseudo distribution pattern we have configured, through Zoo1.cfg, Zoo2.cfg, zoo3.cfg simulates the zookeeper cluster of the three machines. See Code Listing C:

Code Listing C:ZOO1.CFG:

# The number of milliseconds of each tick

ticktime=2000

# The number of ticks that the initial

# Synchronization phase can take

initlimit=10

# The number of ticks that can pass between

# Sending a request and getting an acknowledgement

Synclimit=5

# The directory where the snapshot is stored.

Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_1

# The port at which the clients'll connect

clientport=2181

server.1=localhost:2887:3887

server.2=localhost:2888:3888

server.3=localhost:2889:3889

Zoo2.cfg:

# The number of milliseconds of each tick

ticktime=2000

# The number of ticks that the initial

# Synchronization phase can take

initlimit=10

# The number of ticks that can pass between

# Sending a request and getting an acknowledgement

Synclimit=5

# The directory where the snapshot is stored.

Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2

# The port at which the clients'll connect

clientport=2182

#the location of the log file

Datalogdir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs

server.1=localhost:2887:3887

server.2=localhost:2888:3888

server.3=localhost:2889:3889

Zoo3.cfg:

# The number of milliseconds of each tick

ticktime=2000

# The number of ticks that the initial

# Synchronization phase can take

initlimit=10

# The number of ticks that can pass between

# Sending a request and getting an acknowledgement

Synclimit=5

# The directory where the snapshot is stored.

Datadir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2

# The port at which the clients'll connect

clientport=2183

#the location of the log file

Datalogdir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs

server.1=localhost:2887:3887

server.2=localhost:2888:3888

server.3=localhost:2889:3889

As you can see from the three code listings above, DataDir is different, except for ClientPort. Also, do not forget to create a myID file in the directory corresponding to DataDir to specify the corresponding zookeeper server instance.

Here zookeeper installation has been said, the next section we talk about the zookeeper parameters of the understanding of the configuration.

-------------------------------------------------

Zookeeper Series IV: Zookeeper Configuration

The functional features of zookeeper are controlled by zookeeper configuration files (zoo.cfg configuration files). Zookeeper such a design actually has its own reason. It can be seen from the previous configuration of the zookeeper that when the zookeeper cluster is configured, its configuration documents are identical (only a few are different for the cluster pseudo distribution mode). Such a configuration makes it very convenient to deploy the zookeeper service. Also, if the server uses a different profile, you must make sure that the list of servers in the different profiles matches.

Some parameters are optional when you set up the Zookeeper configuration document, but some parameters are required. These required parameters form the minimum configuration requirements for the Zookeeper configuration document.

The following are the parameters that must be configured in the minimum configuration requirements:

1) Minimum Configuration

ClientPort

Monitor the port on which the client is connected;

DataDir

The location of the storage database snapshot in memory;

Note You should carefully select where the log is stored, and using a dedicated log storage device can greatly improve the performance of your system, and if you store the logs on a more busy storage device, you will have a significant impact on the performance of your system.

Ticktime

The base event unit, in milliseconds. It is used to control heartbeat and timeout, by default the minimum session timeout time is twice times ticktime.

2) Advanced Configuration

The following are optional configuration parameters in the Advanced configuration requirements, and the user can use the following parameters to better specify the behavior of the zookeeper:

Datalogdir

This operation will write the transaction log to the directory specified by "Datalogdir" instead of the directory specified by "DataDir" by the management machine. This will allow the use of a dedicated log device and help us avoid competition between logs and snapshots. The configuration is as follows:

#the location of the log file

Datalogdir=/root/hadoop-0.20.2/zookeeper-3.3.1/log/data_log

Maxclientcnxns

This operation will limit the number of clients connected to zookeeper, limit the number of concurrent connections, and differentiate between clients by IP. This configuration option can be used to block certain categories of Dos attacks. Setting it to 0 or ignoring it without setting will remove the restriction on concurrent connections.

For example, at this point we set the value of Maxclientcnxns to 1, as follows:

#set Maxclientcnxns

Maxclientcnxns=1

After you start zookeeper, you first connect to the zookeeper server with a client. Then, when the second client attempts to connect to the zookeeper, or some implicit connection operation to the client, it triggers the zookeeper configuration. The system prompts for relevant information, as shown in Figure 1 below:

Figure 1:zookeeper Maxclientcnxns exception

Minsessiontimeout and Maxsessiontimeout

The minimum session timeout and the maximum session timeout time. Where the minimum session timeout is twice times the ticktme time by default, and the maximum session timeout is 20 times times the session timeout by default. At boot time, the system displays the appropriate information, as shown in Figure 2 below, the default session timeout:

Figure 2: Default Session Timeout

Reading from the above, minsessiontimeout and maxsessiontimeout values are all-1, now we set the system's minimum session timeout and the maximum session timeout, as follows:

#set Minsessiontimeout

minsessiontimeout=1000

#set Maxsessiontimeout

maxsessiontimeout=10000

When configuring Minsessiontmeout and Maxsessiontimeout values, it is important to note that if you set this value too small, the session may have just been established and will have to exit because of a timeout. In general, you cannot set this value to be smaller than the value of Ticktime.

3) Cluster configuration

Initlimit

This configuration indicates that the follower (as opposed to the leader "client") is allowed to connect and synchronize to the leader initialization connection time, which is expressed in multiples of ticktime. The connection fails when the ticktime time of the set multiplier is exceeded.

Synclimit

This configuration represents the length of time that messages, requests, and replies are sent between leader and follower. If follower cannot communicate with leader within the set time, this follower will be discarded. ---------------------
Zookeeper series of the five: zookeeper operation

We are here to introduce the corresponding zookeeper series of the three: Zookeeper installation mode of operation.

1) Stand-alone mode

Users can start the zookeeper service by using the following command:

zkserver.sh start

This command executes the ZOO.CFG configuration file under Zookeeper's Conf folder by default. When running successfully, users will see a prompt interface similar to the following:

root@ubuntu:~# zkserver.sh Start

JMX enabled by default

Using config:/root/hadoop-0.20.2/zookeeper-3.3.1/bin/. /conf/zoo.cfg

Starting zookeeper ...

Started

... ...

2011-01-19 10:04:42,300-warn [main:quorumpeermain@105]-either no config or no quorum defined in config, running in s Tandalone mode

... ...

2011-01-19 10:04:42,419-info [main:zookeeperserver@660]-Ticktime set to 2000

2011-01-19 10:04:42,419-info [main:zookeeperserver@669]-Minsessiontimeout set to-1

2011-01-19 10:04:42,419-info [main:zookeeperserver@678]-Maxsessiontimeout set to-1

2011-01-19 10:04:42,560-info [main:nioservercnxn$factory@143]-binding to port 0.0.0.0/0.0.0.0:2181

2011-01-19 10:04:42,806-info [main:filesnap@82]-Reading snapshot/root/hadoop-0.20.2/zookeeper-3.3.1/data/ version-2/snapshot.200000036

2011-01-19 10:04:42,927-info [main:filesnap@82]-Reading snapshot/root/hadoop-0.20.2/zookeeper-3.3.1/data/ version-2/snapshot.200000036

2011-01-19 10:04:42,950-info [main:filetxnsnaplog@208]-snapshotting:400000058

As you can see from the above, the system will list the relevant environment configuration information for zookeeper operation after the operation is successful.

2) Cluster mode

Cluster mode requires the user to run the first part of the command on each zookeeper machine, and no longer repeat here.

3) cluster pseudo distribution pattern

In cluster pseudo distribution mode, we have only one machine, but we want to run three zookeeper service instances. At this point, it will not work if you use the above command again. Here, we have three lives to run the Zookeeper series three: Zookeeper installation of the Zookeeper service we configured. As shown below:

zkserver.sh Start Zoo1.cfg

zkserver.sh Start Zoo2.cfg

zkserver.sh Start Zoo3.cfg

After you run the first command, the reader will find some system error prompts, as shown in Figure 1 :

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More