Hadooop diary day20 --- zookeeper series (2)

Source: Internet
Author: User
Zookeeper environment configuration 1. How to Build zookeeper

Zookeeper can be installed in three ways,Standalone ModeAndCluster ModeAndPseudo cluster mode.

Standalone mode: zookeeper runs only on one server and is suitable for testing environments;
Pseudo cluster mode: Multiple zookeeper instances are run on one physical machine.
Cluster mode: zookeeper runs on a cluster and is suitable for the production environment. This computer cluster is called an "Ensemble ).

Zookeeper achieves high availability through replication. As long as more than half of the machines in the collection are available, it can ensure the service continues.Why must we have more than half of them?? This is related to the zookeeper replication policy: zookeeper ensures that every modification to the znode tree will be copied to more than half of the machines in the collection.

1.1 zookeeper standalone mode setup

  (1)DownloadZookeeper: http://pan.baidu.com/s/1pJlwbR9

(2)Extract: Tar-zxvf zookeeper-3.4.5.tar.gzRename: Music zookeeper-3.4.5 ZK

(3)Configuration File: Delete the zoo_sample.cfg file in the conf directory and create a configuration file zoo. cfg.

Ticktime = 2000
Datadir =/usr/local/ZK/Data
Datalogdir =/usr/local/ZK/datalog
Clientport = 2181

(4)Configure Environment Variables: To facilitate future operations, we need to configure the zookeeper environment variables by adding the following content to the/etc/profile file:

Export zookeeper_home =/usr/local/ZK
Export Path =.: $ hadoop_home/bin: $ zookeeper_home/bin: $ java_home/bin: $ path

(5)StartZookeeper server: zkserver. Sh start;CloseZookeeper server: zkserver. Sh stop

1.2 zookeeper pseudo cluster mode construction

Zookeeper can run zookeeper in standalone mode, and simulate the running of zookeeper in cluster mode, that is, run different nodes on the same machine. We know that hadoop operations in pseudo-distribution mode are very different from those in distributed mode, however, Zookeeper operations in the distributed mode are essentially different from those in the cluster mode. Obviously, the pseudo-distributed mode of the cluster provides great convenience for us to experience zookeeper and try some experiments. For example, during the experiment, we can use a small amount of data for testing in the pseudo-distribution mode of the cluster. When the test is feasible, migrate the data to the cluster mode for real data experiments. This does not guarantee its feasibility, but also greatly improve the efficiency of the experiment. This setup method is simple and cost-effective. It is suitable for testing and learning. If your machine is insufficient, you can deploy three servers on one machine.

1.2.1. Notes

Three servers are deployed on one machine. Note that each configuration document we use in the distributed mode of the cluster simulates one machine, that is to say, multiple zookeeper instances are running on a single machine. However, you must ensure that the port numbers in each configuration document cannot conflict. datadir is different except for the clientport. In addition, create a myid file in the directory corresponding to datadir to specify the corresponding zookeeper server instance.

(1) clientport: if multiple servers are deployed on one server, different clientports are required for each server. For example, if server1 is 2181, server2 is 2182, and server3 is 2183,

(2) datadir and datalogdir: datadir and datalogdir must also be distinguished to separate the data files and log files. At the same time, the paths corresponding to the two variables of each server are different.

(3) The numbers server. X and myid: Server. X correspond to the numbers in data/myid. The values 0, 1, and 2 are written to the myid files of the three servers respectively. Therefore, the zoo. cfg file in each server is configured with server.0 server.2 and server.3. Because on the same machine, the two ports connected to the backend are not the same as those of the three servers; otherwise, the port conflict occurs.

The following is the pseudo-distribution mode of the configured cluster. zoo1.cfg, zoo2.cfg, and zoo3.cfg are used to simulate the zookeeper cluster of the three machines. For details, see section 1.1-1.3.

Code List 1.1 zoo1.cfg

# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.dataDir=/usr/local/zk/data_1# the port at which the clients will connectclientPort=2181#the location of the log filedataLogDir=/usr/local/zk/logs_1server.0=localhost:2287:3387server.1=localhost:2288:3388server.2=localhost:2289:3389

Fig 1.1

Code List 1.2 zoo2.cfg

# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.dataDir=/usr/local/zk/data_2# the port at which the clients will connectclientPort=2182#the location of the log filedataLogDir=/usr/local/zk/logs_2server.0=localhost:2287:3387server.1=localhost:2288:3388server.2=localhost:2289:3389

Fig 1.2

Code List zoo3.cfg

# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.dataDir=/usr/local/zk/data_3# the port at which the clients will connectclientPort=2183#the location of the log filedataLogDir=/usr/local/zk/logs_3server.0=localhost:2287:3387server.1=localhost:2288:3388server.2=localhost:2289:3389

Fig 1.3

1.2.2 start

In a distributed cluster, we only have one machine and need to run three zookeeper instances on time. In this case, the startup command in standalone mode does not work. In this case, run the following three commands to run the zookeeper service. As follows:

zkServer.sh start zoo1.shzkServer.sh start zoo2.shzkServer.sh start zoo3.sh

The startup process is shown in 1.4-1.5:

 

Fig 1.4

Fig 1.5

After the first command is run, some error exceptions may occur. The cause of the exception is that every instance of the zookeeper service has global configuration information, they will perform leader election anytime and anywhere at startup. In this case, the first started zookeeper needs to communicate with the other two zookeeper instances. However, the other two zookeeper instances have not been started, so this strange information is generated.

We can ignore it directly. After the "2" and "3" zookeeper instances in the figure are started, the corresponding exception information will naturally disappear. In this case, you can use the following three commands to query.

 zkServer.sh status zoo1.cfg zkServer.sh status zoo2.cfg zkServer.sh status zoo3.cfg

The running status of the zookeeper service, which is 1.6.

 

Fig 1.6

1.3 zookeeper cluster mode construction

To obtain the reliable zookeeper service, you should deploy zookeeper on a cluster. As long as most zookeeper services on the cluster are started, the total zookeeper services will be available. The cluster configuration method is similar to the first two. You also need to configure environment variables. The parameters in the conf/zoo. cf configuration file are the same on each machine.

1.3.1 create a myid

Create a myid file in the datadir (/usr/local/ZK/data) directory.

Server0 server content: 0
Server 1 Machine content: 1
Server2 server content: 2

1.3.2 compile the configuration file

Delete the zoo_sample.cfg file in the conf directory and create a configuration file zoo. cfg, as shown in Figure 2.4.

Code List 2.4 parameter settings in zoo. cfg

# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.dataDir=/usr/local/zk/data# the port at which the clients will connectclientPort=2183#the location of the log filedataLogDir=/usr/local/zk/logserver.0=hadoop:2288:3388server.1=hadoop0:2288:3388server.2=hadoop1:2288:3388

Fig 2.4

1.3.3 start

Start zookeeper server: zkserver. Sh start on three machines respectively;

Ii. configuration of zookeeper

Zookeeper is controlled and managed through the zookeeper configuration file (Zoo. CFG ). this design has its own reasons. We can see from the previous configuration of zookeeper that its configuration documents are identical When configuring the zookeeper cluster. In the pseudo-distribution mode of clusters, a small part is different. This configuration method makes zookeeper service deployment very convenient. If the server uses different configuration files, make sure that the server list in different configuration files matches.

Some parameters are optional and required in the zookeeper configuration document. These required parameters constitute the minimum Configuration Requirements in the zookeeper configuration document. For more detailed configuration of zookeeper, refer to the following content.

2.1 basic configuration

The following parameters must be configured in the minimum Configuration Requirements:

  Client: The port on which the client is connected.
  Ticktime: Basic event unit. This time is used as the interval between zookeeper servers or between clients and servers to maintain heartbeat. A heartbeat is sent every ticktime. The minimum session expiration time is twice ticktime.
  Datadir: The location where the database snapshot is stored in the memory. If no parameter is set, logs for updating food will be stored to the default location.

You should carefully select the location where logs are stored. Using a dedicated log storage device can greatly improve the system performance. If you store logs on busy storage devices, the image system performance will be very high.

2.2 Advanced Configuration

The following are optional configuration parameters in advanced configuration parameters. You can use the following parameters to better regulate the behavior of zookeeper:

(1)Datalogddir

  This operation allows the management machine to write transaction logs to the directory specified by datalogdir instead of the directory specified by datadir. This will allow the use of a dedicated log device, which helps us avoid competition between logs and snapshots. The configuration is as follows:

# The directory where the snapshot is stored
Datadir =/usr/local/ZK/Data

(2)Maxclientcnxns

This operation limits the number of clients connected to zookeeper and the number of concurrent connections. Different clients are distinguished by IP addresses. This configuration option can prevent DoS attacks of some categories. Setting it to zero or ignore it will cancel the limit on concurrent connections.

For example, set maxclientcnxns to 1 as follows:

# Set maxclientcnxns
Maxclientcnxns = 1

After zookeeper is started, a client is used to connect to the zookeeper server. If a second client tries to connect to zookeeper, or some implicit connection operations on the client, the above configuration of zookeeper will be triggered.

(3)MinsessiontimeoutAndMaxsessiontimeout

That is, the Minimum Session timeout and the maximum Session Timeout. By default, minsession = 2 * ticktime; maxsession = 20 * ticktime.

2.3 cluster configuration

(1)Initlimit

This configuration indicates the initial connection time, In ticktime, that is, to allow follower (as opposed to the "client" in leaderer) to connect and synchronize to the leader. If the connection initialization time exceeds this value, the connection fails.

(2)Synclimit

This configuration item indicates the length of request and response time when the leader and Follower send messages. If the follower cannot communicate with the leader within the set time, the follower will be discarded.

(3)Server. A = B: C: d

A: A indicates the number of the server;
B: the IP address of the server;
C: The port selected by the leader;
D: The communication port between zookeeper servers.

(3)MyidAndZoo. cfg

Except for zoo. the CFG configuration file also needs to be configured in cluster mode. This file is under the datadir directory, and there is a data in this file that is the value of A. zookeeper will read this file when it is started, get the data and zoo. compare the configuration information in CFG to determine the server.

3. Build a zookeeper server cluster

Construction requirements:

1> ZK Server Clusters must have no less than three Nodes
2> the system time of each server must be consistent.

3.1 install and configure ZK

(1) Use winscp to transmit ZK to/usr/localon the hadoop host. I use zookeeper-3.4.5.tar.gz.

(2) In the hadoop/usr/localdirectory, decompress zk....tar.gz and set the environment variable.

AExtract: In the/usr/local directory, run the tar-zxvf zookeeper-3.4.5.tar.gz, 2.1.

Fig 2.1

BRename: Decompress the folder, rename it to ZK, execute the command: MV zookeeper-3.4.5 zk2.2.

Fig 2.2

C)Set Environment Variables: Execute the command: VI/etc/profile, add: Export zookeeper_home =/usr/local/ZK, as shown in 2.3. Run the following command: Source/etc/profile 2.4.

Fig 2.3

Fig 2.4

2.2 modify the zk configuration file

(1)Rename: Rename zoo_sample.cfg under the/usr/local/ZK/conf directory to zoo. cfg and run the command: MV zoo_sample.cfg zoo. cfg. 2.5.

Fig 2.5

(2)View: In the/usr/local/ZK/conf directory, modify the VI zoo. cfg file, as shown in Figure 2.6. In this file, datadir indicates the directory where the file is stored. It is set to/tmp/zookeeper by default. This is a temporary directory, which will be lost after restart. Here we set a directory, /usr/local/ZK/data.

Fig 2.6

(2)Create a folder: Mkdir/usr/local/ZK/Data

(3)Create myid: Create the file myid under the Data Directory. The value is 0; VI myid; The content is 0.

(4)Edit: Edit the file, execute VI zoo. cfg, and modify datadir =/usr/local/ZK/data.

New: Server.0 = hadoop: 2888: 3888
Server.1 = hadoop0: 2888: 3888
Server.2 = hadoop1: 2888: 3888

Ticktime: This time is used as the interval between the zookeeper server or between the client and the server to maintain the heartbeat, that is, each ticktime will send a heartbeat;

Datadir: As the name implies, it is the directory where zookeeper stores data. By default, Zookeeper stores the log files that write data in this directory;

Clientport: the port connecting the client to the zookeeper server. zookeeper listens to the port and accepts access requests from the client.

After these configuration items are configured, you can start zookeeper. After the configuration is started, run echo ruok | NC localhost 2181 to check whether zookeeper is in service.

2.3 configure other nodes

(1) copy the zk directory and/etc/profile directory of the haooop host to hadoop0 and hadoop1. Run the following command:

SCP-r/usr/local/ZK/hadoop0:/usr/local/
SCP-r/usr/local/ZK/hadoop1:/usr/local/
SCP/etc/profile hadoop0:/etc/
SCP/etc/profile hadoop1:/etc/

SSH hadoop0
Suorce/etc/profile
VI/usr/local/ZK/data/myid
Exit

SSH hadoop1
Suorce/etc/profile
VI/usr/local/ZK/data/myid
Exit

(2) Change the value of the corresponding myid in hadoop1 to 1, and change the value of the corresponding myid in hadoop2 to 2.

Iv. Launch Inspection

(1) Start. Run the zkserver. Sh start command on the three nodes respectively.

Hadoop Node: 3.1.

 

Fig 3.1

      Hadoop0 Node: 3.2.

Fig 3.2

      Hadoop1 Node: 3.3.

Fig 3.3

(2) Verify that the command zkserver. Sh status is executed on the three nodes respectively. From the figure below, we will find that hadoop and hadoop1 are follower, and hadoop0 is leader.

      Hadoop Node3.4

Fig 3.4

      Hadoop0 Node3.5

Fig 3.5

      Hadoop1 Node3.6

Coming soon:Operations and examples of zookeeper, So stay tuned. The content of this issue is for your reference. If anything is wrong, I hope you can correct it. If you think the article is okay, raise your hand and click here.

Share more and benefit more.
I am everyone and everyone is me.
The rose leaves the remaining fragrance in your hands.

Fig 3.6

 

Hadooop diary day20 --- zookeeper series (2)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.