First, the zookeeper way of building
There are three ways to install zookeeper, Single-machine mode and cluster mode , and pseudo-cluster mode .
Stand-alone mode: Zookeeper only run on one server, suitable for testing environment;
Pseudo-cluster mode: To run multiple zookeeper instances on a single physical machine;
Cluster mode: Zookeeper running on a cluster, suitable for production environments, this computer cluster is called a "collective" (ensemble)
Zookeeper provides high availability through replication, which ensures that services continue as long as more than half of the machines in the collection are in a usable state. Why does it have to be more than half ? This is related to zookeeper's replication strategy: Zookeeper ensures that every modification to the Znode tree is replicated to more than half of the machines in the collection body.
1.1 Zookeeper stand-alone mode construction
Download Zookeeper:http://pan.baidu.com/s/1pjlwbr9
decompression : tar-zxvf zookeeper-3.4.5.tar.gz rename: mv zookeeper-3.4.5 ZK
config file : Delete the zoo_sample.cfg file under the Conf directory and create a profile zoo.cfg.
ticktime=2000
Datadir=/usr/local/zk/data
Datalogdir=/usr/local/zk/datalog
clientport=2181
Configuring Environment Variables : For future convenience, we need to configure the Zookeeper environment variable by adding the following in the/etc/profile file:
Export ZOOKEEPER_HOME=/USR/LOCAL/ZK
Export path=.: $HADOOP _home/bin: $ZOOKEEPER _home/bin: $JAVA _home/bin: $PATH
Start Zookeeper Server:zkServer.sh start; close zookeeper Server:zkServer.sh stop
Construction of pseudo-cluster mode of 1.2 zookeeper
Zookeeper not only can run single-machine mode zookeeper on a single machine, but also can run in the single-machine simulation cluster mode zookeeper, that is, the different nodes are running on the same computer. We know that there is a big difference between the operation of Hadoop and distributed mode in the pseudo-distribution mode, but there is no essential difference between the operation of zookeeper and cluster mode in distributed mode. Obviously, the cluster pseudo-distributed mode provides a great convenience for us to experience zookeeper and do some experimental experiments. For example, when we experiment, we can use a small amount of data in the cluster pseudo-distribution mode for testing. When the test is feasible, the data is migrated to the cluster mode for real data experiment. This not only guarantees its feasibility, but also greatly improves the efficiency of the experiment. This way of building is simple, low cost, suitable for testing and learning, if your hand machine is not enough, you can deploy 3 servers on one machine.
1.2.1. precautions
With 3 servers deployed on a single machine, it is important to note that each configuration document we use in the cluster for distributed mode simulates a machine, that is, a single machine and multiple zookeeper instances running on it. However, you must ensure that the individual port numbers for each configuration document do not conflict, and that the DataDir is different except for clientport. Also, create a myID file in the directory that corresponds to the datadir to specify the corresponding zookeeper server instance.
ClientPort Port: If multiple servers are deployed on 1 machines, each machine will have a different clientport, such as Server1 2181,server2 2182,server3 is 2183
DataDir and Datalogdir:datadir and Datalogdir also need to differentiate between data files and log files, and each server has a different path for the two variables.
Server. X and Myid:server. X this number is the corresponding, the number in the Data/myid. 0,1,2 is written in the myID file for 3 servers, so zoo.cfg in each server is server.0 server.2,server.3. Because on the same machine, behind the 2 ports attached, 3 servers are not the same, or port conflicts
Here is the cluster pseudo-distribution mode I configured, respectively, through Zoo1.cfg, Zoo2.cfg, zoo3.cfg to simulate the zookeeper cluster of three machines, the code listing zoo1.cfg as follows:
# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data_1# the port at which the clients would connectclientpor T=2181#the location of the log filedatalogdir=/usr/local/zk/logs_1server.0=localhost:2287:3387server.1=localhost : 2288:3388server.2=localhost:2289:3389
The code listing ZOO2.CFG is as follows:
# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data_2# the port at which the clients would connectclientpor T=2182#the location of the log filedatalogdir=/usr/local/zk/logs_2server.0=localhost:2287:3387server.1=localhost : 2288:3388server.2=localhost:2289:3389
The code listing ZOO3.CFG is as follows:
# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data_3# the port at which the clients would connectclientpor T=2183#the location of the log filedatalogdir=/usr/local/zk/logs_3server.0=localhost:2287:3387server.1=localhost : 2288:3388server.2=localhost:2289:3389
1.2.2 Start
With the cluster distributed, we have only one machine running three zookeeper instances on time. At this point, it does not work if you are using the startup command in stand-alone mode. At this point, you can run the previously configured zookeeper service as long as you have the following three commands. As shown below:
zkserver.sh start zoo1.shzkServer.sh start zoo2.shzkServer.sh start zoo3.sh
Start the process as shown in:
Start the result as shown in:
After running the first instruction, there will be some error exceptions, and the reason for the exception is that because each instance of the Zookeeper service has global configuration information, they will perform leader election operations whenever and wherever they are launched. At this point, the first zookeeper that is started needs to communicate with another two zookeeper instances. However, the other two zookeeper instances have not been started yet, thus creating a strange message. We just ignore it, and after the "2nd" and "No. 3rd" zookeeper instances are started up, the corresponding exception information will naturally disappear. At this point, you can query by the following three commands.
zkserver.sh status zoo1.cfg zkserver.sh status zoo2.cfg zkserver.sh status zoo3.cfg
Zookeeper the running state of the service, as shown in:
1.3 Zookeeper's cluster mode construction
In order to obtain reliable zookeeper services, users should deploy zookeeper on a single cluster. As long as most of the zookeeper services on the cluster are started, the total zookeeper service will be available. The configuration of the cluster, similar to the first two, also requires the configuration of the environment variables. The CONF/ZOO.CF configuration file has the same parameter settings on each machine
1.3.1 Creating myID
Create a myID file in the DataDir (/usr/local/zk/data) directory
The contents of the Server0 machine are: 0
The contents of the Server1 machine are: 1
The contents of the Server2 machine are: 2
1.3.2 Writing a configuration file
In the Conf directory, delete the zoo_sample.cfg file, create a profile zoo.cfg, as shown below, parameter settings in the code listing ZOO.CFG
# The number of milliseconds of each tickticktime=2000# the number of ticks the initial# synchronization phase can TA keinitlimit=10# the number of ticks so can pass between# sending a request and getting an acknowledgementsynclimit=5# th e directory where the snapshot is stored.datadir=/usr/local/zk/data# the port at which the clients would connectclientport= 2183#the location of the log filedatalogdir=/usr/local/zk/logserver.0=hadoop:2288:3388server.1=hadoop0 : 2288:3388server.2=hadoop1:2288:3388
1.3.3 Start
Starting the Zookeeper Server:zkServer.sh start on 3 machines respectively;
Second, the configuration of zookeeper
The functionality of the zookeeper is controlled by the Zookeeper configuration file (zoo.cfg). This design actually has its own reason, through the front facing the zookeeper configuration can be seen, in the Zookeeper cluster configuration, Its configuration document is exactly the same. In the cluster pseudo-distribution pattern, a few parts are different. This configuration makes it very convenient to deploy the zookeeper service. If the server uses a different configuration file, you must ensure that the list of servers in the different configuration files matches.
When setting the Zookeeper configuration document, some parameters are optional and some are required. These mandatory parameters form the minimum configuration requirements for the Zookeeper configuration document. In addition, to configure the zookeeper in more detail, you can refer to the following content.
2.1 Basic Configuration
The following are the parameters that must be configured in the minimum configuration requirements:
(1)client: listens to the port on which the clients are connected.
(2) ticktime: Basic event Unit, this time is the zookeeper server or between the client and the server to maintain heartbeat between the time interval, every ticktime time will send a heartbeat; The session expiration time is twice times ticktime
DataDir: Stores the location of the database snapshot in memory, and if you do not set a parameter, the log of the updated food will be stored in the default location.
It should be prudent to choose the location of the log storage, the use of dedicated log memory devices can greatly improve the performance of the system, if the log is stored on the more busy storage devices, then will be a large part of the image system performance.
2.2 Advanced Configuration
The following are the optional configuration parameters in the Advanced configuration parameters that users can use to better specify the behavior of zookeeper:
(1) Datalogddir
This operation allows the management machine to write the transaction log to the directory specified by "Datalogdir" instead of the directory specified by "DataDir". This will allow the use of a dedicated log device to help us avoid the contention of logs and snapshots. The configuration is as follows:
# The directory where the snapshot is stored
Datadir=/usr/local/zk/data
(2) Maxclientcnxns
This operation will limit the number of clients connected to the zookeeper and limit the number of concurrent connections, using IP to differentiate between clients. This configuration option can block certain categories of Dos attacks. Setting him to zero or ignoring the setting will cancel the restriction on concurrent connections.
For example, at this point we set the value of Maxclientcnxns to 1, as follows:
# set Maxclientcnxns
Maxclientcnxns=1
After starting zookeeper, first connect to the zookeeper server with a client. If a second client attempts to connect to the zookeeper, or if there is some implicit connection operation to the client, the above configuration of zookeeper will be triggered.
(3) minsessiontimeout and maxsessiontimeout
That is, the minimum session time-out and the maximum session timeout. By default, Minsession=2*ticktime;maxsession=20*ticktime.
2.3 Cluster configuration
(1) initlimit
This configuration indicates that the follower (relative to the Leaderer "client") is allowed to connect and synchronize the initial connection time to leader, in ticktime units. When the connection time is initialized beyond this value, the connection fails.
(2) synclimit
This configuration item represents the length of the request and response time when a message is sent between leader and follower. If follower cannot communicate with leader within the set time, then this follower will be discarded.
(3) server. A=b:c:d
A: Where A is a number, indicating that this is the number of the server;
B: Is the IP address of this server;
C:leader election of the port;
D:zookeeper the communication ports between servers.
(4) myID and zoo.cfg
In addition to modifying the Zoo.cfg configuration file, in the cluster mode to configure a file myID, the file in the DataDir directory, the file contains a data is a value, Zookeeper startup will read this file, get the data inside and zoo.cfg The configuration information is compared to determine the server.
Third, build zookeeper server cluster
Construction Requirements:
(1) ZK server cluster size is not less than 3 nodes
(2) Require the system time between the servers to be consistent.
3.1 Installation Configuration ZK
(1) using WINSCP to transfer ZK to the/usr/local on the Hadoop host, the version I used was zookeeper-3.4.5.tar.gz.
(2) under the/usr/local directory of Hadoop, unzip zk....tar.gz, set environment variables
decompression : In the/usr/local directory, execute command: TAR-ZXVF zookeeper-3.4.5.tar.gz, as shown in:
Rename: Unzip the folder, rename it to ZK, execute the command: MV zookeeper-3.4.5 ZK, as shown:
Set Environment variables : Execute command: vi/etc/profile, add: Export zookeeper_home=/usr/local/zk,2.3 the content shown. Execute command: Source/etc/profile as shown:
2.2 Modifying the ZK configuration file
(1) rename:/usr/local/zk/conf directory under Zoo_sample.cfg, renamed to Zoo.cfg, execute command: MV zoo_sample.cfg zoo.cfg. As shown in the following example:
(2) View : In the/usr/local/zk/conf directory, modify the file vi zoo.cfg, the contents of the file as shown. In this file DataDir represents the file directory, its default setting is/tmp/zookeeper this is a temporary storage directory, each reboot will be lost, in this we set up a directory,/usr/local/zk/data.
(3) Create folder : Mkdir/usr/local/zk/data
(4) Create myID: In the Data directory, create the file myID, the value is 0;vi myID, the content is 0.
(5) edit : Edit the file, execute VI zoo.cfg, modify the Datadir=/usr/local/zk/data.
New :
server.0=hadoop:2888:3888
server.1=hadoop0:2888:3888
server.2=hadoop1:2888:3888
Ticktime: This time is the time interval between the Zookeeper server or between the client and the server to maintain the heartbeat, that is, each ticktime time sends a heartbeat;
DataDir: As the name implies is Zookeeper to save the data directory, by default, Zookeeper will write the data log file is also stored in this directory;
ClientPort: This port is the port that the client connects to the Zookeeper server, Zookeeper listens to the port and accepts the client's access request.
When these configuration items are configured, you can start Zookeeper and use the command after startup Echo Ruok | NC localhost 2181 checks if Zookeeper is already in service.
2.3 Configuring additional nodes
(1) copy the ZK directory and/etc/profile directory of the Haooop host to Hadoop0 and HADOOP1. Execute command:
Scp-r/usr/local/zk/hadoop0:/usr/local/
Scp-r/usr/local/zk/hadoop1:/usr/local/
Scp/etc/profile hadoop0:/etc/
Scp/etc/profile hadoop1:/etc/
SSH hadoop0
Suorce/etc/profile
Vi/usr/local/zk/data/myid
Exit
SSH HADOOP1
Suorce/etc/profile
Vi/usr/local/zk/data/myid
Exit
(2) Change the value of the corresponding myID in the HADOOP1 to 1 and change the value of the corresponding myID in the HADOOP2 to 2.
Iv. Start-up inspection
(1) start, execute the command separately on three nodes zkserver.sh start
Hadoop node :
hadoop0 node :
HADOOP1 node :
(2) test, execute the command zkserver.sh status on three nodes separately, from the figure below we will find that Hadoop and Hadoop1 are follower,hadoop0 for leader.
Hadoop node :
hadoop0 node :
HADOOP1 node :
Zookeeper installation Configuration