There are three ways to install zookeeper, Single-machine mode and cluster mode , and pseudo-cluster mode .
Stand-alone mode: Zookeeper only run on one server, suitable for testing environment;
Pseudo-cluster mode: To run multiple zookeeper instances on a single physical machine;
Cluster mode: Zookeeper running on a cluster, suitable for production environments, this computer cluster is called a "collective" (ensemble)
Zookeeper provides high availability through replication, which ensures that services continue as long as more than half of the machines in the collection are in a usable state. Why must it be more than half? This is related to zookeeper's replication strategy: Zookeeper ensures that every modification to the Znode tree is replicated to more than half of the machines in the collection body. 1.1 Zookeeper stand-alone mode construction
Download Zookeeper:http://pan.baidu.com/s/1pjlwbr9
decompression : tar-zxvf zookeeper-3.4.5.tar.gz rename: mv zookeeper-3.4.5 ZK
config file : Delete the zoo_sample.cfg file under the Conf directory and create a profile zoo.cfg.
ticktime=2000
Datadir=/usr/local/zk/data
Datalogdir=/usr/local/zk/datalog
clientport=2181
Configuring Environment Variables : For future convenience, we need to configure the Zookeeper environment variable by adding the following in the/etc/profile file:
Export ZOOKEEPER_HOME=/USR/LOCAL/ZK
Export path=.: $HADOOP _home/bin: $ZOOKEEPER _home/bin: $JAVA _home/bin: $PATH
Start Zookeeper Server:zkServer.sh start; close zookeeper Server:zkServer.sh stop
construction of Pseudo-cluster mode of 1.2 zookeeper
Zookeeper not only can run single-machine mode zookeeper on a single machine, but also can run in the single-machine simulation cluster mode zookeeper, that is, the different nodes are running on the same computer. We know that there is a big difference between the operation of Hadoop and distributed mode in the pseudo-distribution mode, but there is no essential difference between the operation of zookeeper and cluster mode in distributed mode. Obviously, the cluster pseudo-distributed mode provides a great convenience for us to experience zookeeper and do some experimental experiments. For example, when we experiment, we can use a small amount of data in the cluster pseudo-distribution mode for testing. When the test is feasible, the data is migrated to the cluster mode for real data experiment. This not only guarantees its feasibility, but also greatly improves the efficiency of the experiment. This way of building is simple, low cost, suitable for testing and learning, if your hand machine is not enough, you can deploy 3 servers on one machine.
1.2.1. precautions
With 3 servers deployed on a single machine, it is important to note that each configuration document we use in the cluster for distributed mode simulates a machine, that is, a single machine and multiple zookeeper instances running on it. However, you must ensure that the individual port numbers for each configuration document do not conflict, and that the DataDir is different except for clientport. Also, create a myID file in the directory that corresponds to the datadir to specify the corresponding zookeeper server instance.
ClientPort Port: If multiple servers are deployed on 1 machines, each machine will have a different clientport, such as Server1 2181,server2 2182,server3 is 2183
DataDir and Datalogdir:datadir and Datalogdir also need to differentiate between data files and log files, and each server has a different path for the two variables.
Server. X and Myid:server. X this number is the corresponding, the number in the Data/myid. 0,1,2 is written in the myID file for 3 servers, so zoo.cfg in each server is server.0 server.2,server.3. Because on the same machine, behind the 2 ports attached, 3 servers are not the same, or port conflicts
The following is the cluster pseudo-distribution mode I configured, respectively, through Zoo1.cfg, Zoo2.cfg, zoo3.cfg to simulate the zookeeper cluster of three machines,
The code listing ZOO1.CFG is as follows:
# The number of milliseconds of each tick ticktime=2000 # The number of ticks, the
initial
# Synchronizati On phase can take
initlimit=10 # The number of
ticks so can pass between
# Sending a request and getting an Acknowledgement
synclimit=5
# The directory where the snapshot is stored.
Datadir=/usr/local/zk/data_1
# The port at which the clients would connect
clientport=2181
#the location of The log file
datalogdir=/usr/local/zk/logs_1
server.0=localhost:2287:3387
server.1=localhost : 2288:3388
server.2=localhost:2289:3389
The code listing ZOO2.CFG is as follows:
# The number of milliseconds of each tick ticktime=2000 # The number of ticks, the
initial
# Synchronizati On phase can take
initlimit=10 # The number of
ticks so can pass between
# Sending a request and getting an Acknowledgement
synclimit=5
# The directory where the snapshot is stored.
datadir=/usr/local/zk/data_2
# The port at which the clients would connect
clientport=2182
#the location of The log file
datalogdir=/usr/local/zk/logs_2
server.0=localhost:2287:3387
server.1=localhost : 2288:3388
server.2=localhost:2289:3389
The code listing ZOO3.CFG is as follows:
# The number of milliseconds of each tick ticktime=2000 # The number of ticks, the
initial
# Synchronizati On phase can take
initlimit=10 # The number of
ticks so can pass between
# Sending a request and getting an Acknowledgement
synclimit=5
# The directory where the snapshot is stored.
Datadir=/usr/local/zk/data_3
# The port at which the clients would connect
clientport=2183
#the location of The log file
Datalogdir=/usr/local/zk/logs_3
server.0=localhost:2287:3387
server.1=localhost : 2288:3388
server.2=localhost:2289:3389
1.2.2 Start
With the cluster distributed, we have only one machine running three zookeeper instances on time. At this point, it does not work if you are using the startup command in stand-alone mode. At this point, you can run the previously configured zookeeper service as long as you have the following three commands. As shown below:
zkserver.sh start zoo1.sh
zkserver.sh start zoo2.sh
zkserver.sh start zoo3.sh
Start the process as shown in the following figure:
start the result as shown in the following figure:
After running the first instruction, there will be some error exceptions, and the reason for the exception is that because each instance of the Zookeeper service has global configuration information, they will perform leader election operations whenever and wherever they are launched. At this point, the first zookeeper that is started needs to communicate with another two zookeeper instances. However, the other two zookeeper instances have not been started yet, thus creating a strange message. We just ignore it, and after the "2nd" and "No. 3rd" zookeeper instances are started up, the corresponding exception information will naturally disappear. At this point, you can query by the following three commands.
zkserver.sh status zoo1.cfg
zkserver.sh status zoo2.cfg
zkserver.sh status zoo3.cfg
The operating state of the zookeeper service, as shown in the following figure:
1.3 Zookeeper's cluster mode construction
In order to obtain reliable zookeeper services, users should deploy zookeeper on a single cluster. As long as most of the zookeeper services on the cluster are started, the total zookeeper service will be available. The configuration of the cluster, similar to the first two, also requires the configuration of the environment variables. The parameters of the CONF/ZOO.CF configuration file are set identically on each machine 1.3.1 created myID
Create a myID file in the DataDir (/usr/local/zk/data) directory
The contents of the Server0 machine are: 0
The contents of the Server1 machine are: 1
The contents of the Server2 machine are: 2 1.3.2 Writing the configuration file
In the Conf directory, delete the zoo_sample.cfg file, create a profile zoo.cfg, as shown below, parameter settings in the code listing ZOO.CFG
# The number of milliseconds of each tick ticktime=2000 # The number of ticks, the
initial
# Synchronizati On phase can take
initlimit=10 # The number of
ticks so can pass between
# Sending a request and getting an Acknowledgement
synclimit=5
# The directory where the snapshot is stored.
Datadir=/usr/local/zk/data
# The port at which the clients would connect
clientport=2183
#the location of th E log file
datalogdir=/usr/local/zk/log
server.0=hadoop:2288:3388
server.1=hadoop0:2288:3388
server.2=hadoop1:2288:3388
1.3.3 Start
Start Zookeeper Server:zkServer.sh start on 3 machines respectively. Configuration of Zookeeper
The functionality of the zookeeper is controlled by the Zookeeper configuration file (zoo.cfg). This design actually has its own reason, through the front facing the zookeeper configuration can be seen, in the Zookeeper cluster configuration, Its configuration document is exactly the same. In the cluster pseudo-distribution pattern, a few parts are different. This configuration makes it very convenient to deploy the zookeeper service. If the server uses a different configuration file, you must ensure that the list of servers in the different configuration files matches.
When setting the Zookeeper configuration document, some parameters are optional and some are required. These mandatory parameters form the minimum configuration requirements for the Zookeeper configuration document. In addition, to configure the zookeeper in more detail, you can refer to the following content. 2.1 Basic Configuration
The following are the parameters that must be configured in the minimum configuration requirements:
(1)client: listens to the port on which the clients are connected.
(2) ticktime: Basic event Unit, this time is the zookeeper server or between the client and the server to maintain heartbeat between the time interval, every ticktime time will send a heartbeat; The session expiration time is twice times ticktime
DataDir: Stores the location of the database snapshot in memory, and if you do not set a parameter, the log of the updated food will be stored in the default location.
It should be prudent to choose the location of the log storage, the use of dedicated log memory devices can greatly improve the performance of the system, if the log is stored on the more busy storage devices, then will be a large part of the image system performance.
2.2 Advanced Configuration
The following are the optional configuration parameters in the Advanced configuration parameters that users can use to better specify the behavior of zookeeper:
(1) Datalogddir
This operation allows the management machine to write the transaction log to the directory specified by "Datalogdir" instead of the directory specified by "DataDir". This will allow the use of a dedicated log device to help us avoid the contention of logs and snapshots. The configuration is as follows:
# The directory where the snapshot is stored
Datadir=/usr/local/zk/data
(2) Maxclientcnxns
This operation will limit the number of clients connected to the zookeeper and limit the number of concurrent connections, using IP to differentiate between clients. This configuration option can block certain categories of Dos attacks. Setting him to zero or ignoring the setting will cancel the restriction on concurrent connections.
For example, at this point we set the value of Maxclientcnxns to 1, as follows:
# set Maxclientcnxns
Maxclientcnxns=1
After starting zookeeper, first connect to the zookeeper server with a client. If a second client attempts to connect to the zookeeper, or if there is some implicit connection operation to the client, the above configuration of zookeeper will be triggered.
(3) minsessiontimeout and maxsessiontimeout
That is, the minimum session time-out and the maximum session timeout. By default, Minsession=2*ticktime;maxsession=20*ticktime. 2.3 Cluster Configuration
(1) initlimit
This configuration indicates that the follower (relative to the Leaderer "client") is allowed to connect and synchronize the initial connection time to leader, in ticktime units. When the connection time is initialized beyond this value, the connection fails.
(2) Synclimit
This configuration item represents the length of the request and response time when a message is sent between leader and follower. If follower cannot communicate with leader within the set time, then this follower will be discarded.
(3) server. A=b:c:d
A: Where A is a number, indicating that this is the number of the server;
B: Is the IP address of this server;
C:leader election of the port;
D:zookeeper the communication ports between servers.
(4) myID and zoo.cfg
In addition to modifying the Zoo.cfg configuration file, in the cluster mode to configure a file myID, the file in the DataDir directory, the file contains a data is a value, Zookeeper startup will read this file, get the data inside and zoo.cfg The configuration information is compared to determine the server. third, build zookeeper server cluster
Construction Requirements:
(1) ZK server cluster size is not less than 3 nodes
(2) Require the system time between the servers to be consistent. 3.1 Installation configuration ZK
(1) using WINSCP to transfer ZK to the/usr/local on the Hadoop host, the version I used was zookeeper-3.4.5.tar.gz.
(2) under the/usr/local directory of Hadoop, unzip zk....tar.gz, set environment variables
Unzip : In the/usr/local directory, execute the command: TAR-ZXVF zookeeper-3.4.5.tar.gz, as shown in the following figure:
Rename: Unzip the folder, rename it to ZK, execute the command: MV zookeeper-3.4.5 ZK, as shown in the following figure:
Set Environment variables: Execute command: vi/etc/profile, add: Export Zookeeper_home=/usr/local/zk, as shown in Figure 2.3. Execute command: Source/etc/profile as shown in the following figure:
、
2.2 Modifying the ZK configuration file
(1) rename:/usr/local/zk/conf directory under Zoo_sample.cfg, renamed to Zoo.cfg, execute command: MV zoo_sample.cfg zoo.cfg. As shown in the following figure:
(2) View: In the/usr/local/zk/conf directory, modify the file vi zoo.cfg, the file content as shown in the figure below. In this file DataDir represents the file directory, its default setting is/tmp/zookeeper this is a temporary storage directory, each reboot will be lost, in this we set up a directory,/usr/local/zk/data.
(3) Create folder : Mkdir/usr/local/zk/data
(4) Create myID: In the Data directory, create the file myID, the value is 0;vi myID, the content is 0.
(5) Edit : Edit the file, execute VI zoo.cfg, modify the Datadir=/usr/local/zk/data.
New :
server.0=hadoop:2888:3888
server.1=hadoop0:2888:3888
server.2=hadoop1:2888:3888