Zookeeper installation configuration (reprint)

Source: Internet
Author: User

Reprinted from: Http://www.cnblogs.com/sunddenly/p/4018459.html One, zookeeper construction way

There are three ways to install zookeeper, Single-machine mode and cluster mode , and pseudo-cluster mode .

Stand-alone mode: Zookeeper only run on one server, suitable for testing environment;
Pseudo-cluster mode: To run multiple zookeeper instances on a single physical machine;
Cluster mode: Zookeeper running on a cluster, suitable for production environments, this computer cluster is called a "collective" (ensemble)

Zookeeper provides high availability through replication, which ensures that services continue as long as more than half of the machines in the collection are in a usable state. Why does it have to be more than half ? This is related to zookeeper's replication strategy: Zookeeper ensures that every modification to the Znode tree is replicated to more than half of the machines in the collection body. 1.1 Zookeeper stand-alone mode construction

Download Zookeeper:http://pan.baidu.com/s/1pjlwbr9

decompression : tar-zxvf zookeeper-3.4.5.tar.gz rename: mv zookeeper-3.4.5 ZK

config file : Delete the zoo_sample.cfg file under the Conf directory and create a profile zoo.cfg.


Configuring Environment Variables : For future convenience, we need to configure the Zookeeper environment variable by adding the following in the/etc/profile file:

Export path=.: $HADOOP _home/bin: $ZOOKEEPER _home/bin: $JAVA _home/bin: $PATH

Start Zookeeper Server:zkServer.sh start; turn off Zookeeper's Server:zkServer.sh stop 1.2 zookeeper pseudo-cluster mode Setup

Zookeeper not only can run single-machine mode zookeeper on a single machine, but also can run in the single-machine simulation cluster mode zookeeper, that is, the different nodes are running on the same computer. We know that there is a big difference between the operation of Hadoop and distributed mode in the pseudo-distribution mode, but there is no essential difference between the operation of zookeeper and cluster mode in distributed mode. Obviously, the cluster pseudo-distributed mode provides a great convenience for us to experience zookeeper and do some experimental experiments. For example, when we experiment, we can use a small amount of data in the cluster pseudo-distribution mode for testing. When the test is feasible, the data is migrated to the cluster mode for real data experiment. This not only guarantees its feasibility, but also greatly improves the efficiency of the experiment. This way of building is simple, low cost, suitable for testing and learning, if your hand machine is not enough, you can deploy 3 servers on one machine. 1.2.1. precautions

With 3 servers deployed on a single machine, it is important to note that each configuration document we use in the cluster for distributed mode simulates a machine, that is, a single machine and multiple zookeeper instances running on it. However, you must ensure that the individual port numbers for each configuration document do not conflict, and that the DataDir is different except for clientport. Also, create a myID file in the directory that corresponds to the datadir to specify the corresponding zookeeper server instance.

ClientPort Port: If multiple servers are deployed on 1 machines, each machine will have a different clientport, such as Server1 2181,server2 2182,server3 is 2183

DataDir and Datalogdir:datadir and Datalogdir also need to differentiate between data files and log files, and each server has a different path for the two variables.

Server. X and Myid:server. X this number is the corresponding, the number in the Data/myid. 0,1,2 is written in the myID file for 3 servers, so zoo.cfg in each server is server.0 server.2,server.3. Because on the same machine, behind the 2 ports attached, 3 servers are not the same, or port conflicts

Here is the cluster pseudo-distribution mode I configured, respectively, through Zoo1.cfg, Zoo2.cfg, zoo3.cfg to simulate the zookeeper cluster of three machines, the code listing zoo1.cfg as follows:

# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data_1# the port at which the clients would connectclientpor T=2181#the location of the log filedatalogdir=/usr/local/zk/logs_1server.0=localhost:2287:3387server.1=localhost : 2288:3388server.2=localhost:2289:3389

The code listing ZOO2.CFG is as follows:

# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data_2# the port at which the clients would connectclientpor T=2182#the location of the log filedatalogdir=/usr/local/zk/logs_2server.0=localhost:2287:3387server.1=localhost : 2288:3388server.2=localhost:2289:3389

The code listing ZOO3.CFG is as follows:

# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data_3# the port at which the clients would connectclientpor T=2183#the location of the log filedatalogdir=/usr/local/zk/logs_3server.0=localhost:2287:3387server.1=localhost : 2288:3388server.2=localhost:2289:3389

1.2.2 Start

With the cluster distributed, we have only one machine running three zookeeper instances on time. At this point, it does not work if you are using the startup command in stand-alone mode. At this point, you can run the previously configured zookeeper service with the following three commands as follows:

zkserver.sh start zoo1.shzkServer.sh start zoo2.shzkServer.sh start zoo3.sh

Start the process as shown in:

Start the result as shown in:

After running the first instruction, there will be some error exceptions, and the reason for the exception is that because each instance of the Zookeeper service has global configuration information, they will perform leader election operations whenever and wherever they are launched. At this point, the first zookeeper that is started needs to communicate with another two zookeeper instances. However, the other two zookeeper instances have not been started yet, thus creating a strange message. We just ignore it, and after the "2nd" and "No. 3rd" zookeeper instances are started up, the corresponding exception information will naturally disappear. At this point, you can query by the following three commands.

zkserver.sh status zoo1.cfg zkserver.sh status zoo2.cfg zkserver.sh status zoo3.cfg

Zookeeper the running state of the service, as shown in:

1.3 Zookeeper's cluster mode construction

In order to obtain reliable zookeeper services, users should deploy zookeeper on a single cluster. As long as most of the zookeeper services on the cluster are started, the total zookeeper service will be available. The configuration of the cluster, similar to the first two, also requires the configuration of the environment variables. The parameters of the CONF/ZOO.CF configuration file are set identically on each machine 1.3.1 created myID

Create a myID file in the DataDir (/usr/local/zk/data) directory

The contents of the Server0 machine are: 0
The contents of the Server1 machine are: 1
The contents of the Server2 machine are: 2

1.3.2 Writing a configuration file

In the Conf directory, delete the zoo_sample.cfg file, create a profile zoo.cfg, as shown below, parameter settings in the code listing ZOO.CFG

# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data# the port at which the clients would connectclientport= 2183#the location of the log filedatalogdir=/usr/local/zk/logserver.0=hadoop:2288:3388server.1=hadoop0 : 2288:3388server.2=hadoop1:2288:3388

1.3.3 Start

Start Zookeeper Server:zkServer.sh start on 3 machines respectively. Configuration of Zookeeper

The functionality of the zookeeper is controlled by the Zookeeper configuration file (zoo.cfg). This design actually has its own reason, through the front facing the zookeeper configuration can be seen, in the Zookeeper cluster configuration, Its configuration document is exactly the same, except for the contents of the myID file. In the cluster pseudo-distribution pattern, a few parts are different. This configuration makes it very convenient to deploy the zookeeper service. If the server uses a different configuration file, you must ensure that the list of servers in the different configuration files matches.

When setting the Zookeeper configuration document, some parameters are optional and some are required. These mandatory parameters form the minimum configuration requirements for the Zookeeper configuration document. In addition, to configure the zookeeper in more detail, you can refer to the following content. 2.1 Basic Configuration

The following are the parameters that must be configured in the minimum configuration requirements:

(1)client: listens to the port on which the clients are connected.
(2) ticktime: Basic event Unit, this time is the zookeeper server or between the client and the server to maintain heartbeat between the time interval, every ticktime time will send a heartbeat ; The minimum session expiration time is twice times ticktime
DataDir: Stores the location of the database snapshot in memory, and if you do not set a parameter, the log of the update transaction is stored in the default location.

# The directory where the snapshot is stored

It should be prudent to choose the location of the log storage, the use of dedicated log memory devices can greatly improve the performance of the system, if the log is stored on the more busy storage devices, then will be a large part of the image system performance.

2.2 Advanced Configuration

The following are the optional configuration parameters in the Advanced configuration parameters that users can use to better specify the behavior of zookeeper:

(1) Datalogddir

This operation allows the management machine to write the transaction log to the directory specified by "Datalogdir" instead of the directory specified by "DataDir". This will allow the use of a dedicated log device to help us avoid the contention of logs and snapshots.

(2) Maxclientcnxns

This operation will limit the number of clients connected to the zookeeper and limit the number of concurrent connections, using IP to differentiate between clients. This configuration option can block certain categories of Dos attacks. Setting him to zero or ignoring the setting will cancel the restriction on concurrent connections.

For example, at this point we set the value of Maxclientcnxns to 1, as follows:

# set Maxclientcnxns

After starting zookeeper, first connect to the zookeeper server with a client. If a second client attempts to connect to the zookeeper, or if there is some implicit connection operation to the client, the above configuration of zookeeper will be triggered.

(3) minsessiontimeout and maxsessiontimeout

That is, the minimum session time-out and the maximum session timeout. By default, Minsession=2*ticktime;maxsession=20*ticktime. 2.3 Cluster configuration

(1) initlimit

This configuration indicates that the follower (relative to the Leaderer "client") is allowed to connect and synchronize the initial connection time to leader, in ticktime units. When the connection time is initialized beyond this value, the connection fails.

(2) synclimit

This configuration item represents the length of the request and response time when a message is sent between leader and follower. If follower cannot communicate with leader within the set time, then this follower will be discarded.

(3) server. A=b:c:d

A: Where A is a number, indicating that this is the number of the server;
B: Is the IP address of this server;
C:leader election of the port;
D:zookeeper the communication ports between servers.

(4) myID and zoo.cfg

In addition to modifying the Zoo.cfg configuration file, the cluster mode to configure a file myID, this file in the DataDir directory, if you do not create it (note that DataDir may also not exist, but also to create their own), the file contains a data is a value, Zookeeper will read this file when it starts, and get the data inside to compare with the configuration information inside the ZOO.CFG to determine the server. Third, build zookeeper server cluster

Construction Requirements:

(1) ZK server cluster size is not less than 3 nodes
(2) Require the system time between the servers to be consistent.

3.1 Installation Configuration ZK

(1) using WINSCP to transfer ZK to the/usr/local on the Hadoop host, the version I used was zookeeper-3.4.5.tar.gz.

(2) under the/usr/local directory of Hadoop, unzip zk....tar.gz, set environment variables

decompression : In the/usr/local directory, execute command: TAR-ZXVF zookeeper-3.4.5.tar.gz, as shown in:

Rename: Unzip the folder, rename it to ZK, execute the command: MV zookeeper-3.4.5 ZK, as shown:

Set Environment variables : Execute command: vi/etc/profile, add: Export zookeeper_home=/usr/local/zk,2.3 the content shown. Execute command: Source/etc/profile as shown:

2.2 Modifying the ZK configuration file

(1) rename:/usr/local/zk/conf directory under Zoo_sample.cfg, renamed to Zoo.cfg, execute command: MV zoo_sample.cfg zoo.cfg. As shown in the following example:

(2) View : In the/usr/local/zk/conf directory, modify the file vi zoo.cfg, the contents of the file as shown. In this file DataDir represents the file directory, its default setting is/tmp/zookeeper this is a temporary storage directory, each reboot will be lost, in this we set up a directory,/usr/local/zk/data.

(3) Create folder : Mkdir/usr/local/zk/data

(4) Create myID: In the Data directory, create the file myID, the value is 0;vi myID, the content is 0.

(5) edit : Edit the file, execute VI zoo.cfg, modify the Datadir=/usr/local/zk/data.

New :


Ticktime: This time is the time interval between the Zookeeper server or between the client and the server to maintain the heartbeat, that is, each ticktime time sends a heartbeat;

DataDir: As the name implies is Zookeeper to save the data directory, by default, Zookeeper will write the data log file is also stored in this directory;

ClientPort: This port is the port that the client connects to the Zookeeper server, Zookeeper listens to the port and accepts the client's access request.

When these configuration items are configured, you can start Zookeeper and use the command after startup Echo Ruok | NC localhost 2181 checks if Zookeeper is already in service.

2.3 Configuring additional nodes

(1) copy the ZK directory and/etc/profile directory of the Haooop host to Hadoop0 and HADOOP1. Execute command:

Scp/etc/profile hadoop0:/etc/
Scp/etc/profile hadoop1:/etc/

SSH hadoop0


(2) Change the value of the corresponding myID in the HADOOP1 to 1 and change the value of the corresponding myID in the HADOOP2 to 2. Iv. Start-up inspection

(1) start, execute the command separately on three nodes zkserver.sh start

Hadoop node :

hadoop0 node :

HADOOP1 node :

(2) test, execute the command zkserver.sh status on three nodes separately, from the figure below we will find that Hadoop and Hadoop1 are follower,hadoop0 for leader.

Hadoop node :

hadoop0 node :

HADOOP1 node :

Zookeeper installation configuration (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.