Hadoop serialization Series II: distributed installation of Zookeeper

Source: Internet
Author: User
Tags scp command

1 Overview

The Zookeeper distributed service framework is a sub-project of Apache Hadoop. It is mainly used to solve some data management problems frequently encountered in distributed applications, such: unified Naming Service, status Synchronization Service, cluster management, and management of distributed application configuration items. ZooKeeper can be installed and run in Standalone mode. However, ZooKeeper ensures the stability and availability of ZooKeeper clusters through distributed ZooKeeper clusters (one Leader and multiple Follower) based on certain policies, in this way, the reliability of distributed applications is achieved. Zookeeper maintains a hierarchical data structure, which is very similar to a standard file system, as shown in

 

The Zookeeper data structure has the following features:

  1. Each sub-directory item, such as NameService, is called znode. This znode is uniquely identified by its path. For example, the znode of Server1 is/NameService/Server1.

  2. Znode can have sub-node directories and each znode can store data. Note that directory nodes of the EPHEMERAL type cannot have sub-node directories.

  3. Znode has versions. Data stored in each znode can have multiple versions, that is, multiple copies of data can be stored in one access path.

  4. Znode can be a temporary node. Once the client that creates this znode loses contact with the server, this znode will also be automatically deleted. The client of Zookeeper communicates with the server through a persistent connection, each client and server are connected by heartbeat. The connection status is called session. If znode is a temporary node and the session becomes invalid, znode is deleted.

  5. The directory name of znode can be automatically numbered. If App1 already exists and is created, it will be automatically named App2.

  6. Znode can be monitored, including the modification of data stored in this directory node and the change of the sub-node directory. Once changed, the monitoring client can be notified. This is the core feature of Zookeeper, many functions of Zookeeper are implemented based on this feature. examples will be provided in the following typical application scenarios.

 

2. Environment deployment

The deployment of the Zookeeper cluster is based on the Hadoop cluster deployed in the previous article. The cluster configuration is as follows:

Zookeeper1 rango 192.168.56.1

Zookeeper2 vm2 192.168.56.102

Zookeeper3 vm3 192.168.56.103

Zookeeper4 vm4 192.168.56.104

Zookeeper5 vm1 192.168.56.101

 

3. installation and configuration

3.1 download and install Zookeeper

Download the latest Zookeeper version from the Apache official website, decompress it to the/usr directory, and rename it zookeeper:

Tar zxvf zookeeper-3.4.5.tar.gz; mv zookeeper-3.4.5/usr/zookeeper

Set the zookeeper directory owner to hadoop:

Chown-R hadoop: hadoop/usr/zookeeper

Ps: You can first install and configure on the master machine, and then copy the scp command to other nodes in the Cluster:

Scp-R/usr/zookeeper node ip:/usr

 

3.2 configure Zookeeper

3.2.1 create a data directory

Run:

Mkdir/var/lib/zookeeper

 

3.2.2 configure Environment Variables

Vim/etc/profile:

# Set zookeeper path
Export ZOOKEEPER_HOME =/usr/zookeeper
Export PATH = $ PATH: $ ZOOKEEPER_HOME/bin

 

3.2.3 configure the Zookeeper Cluster

Cp/usr/zookeeper/conf/zoo_sample.cfg zoo. cfg

Vim zoo. cfg:

# The number of milliseconds of each tick
TickTime = 2000
# The number of ticks that the initial
# Synchronization phase can take
InitLimit = 10
# The number of ticks that can pass
# Sending a request and getting an acknowledgement
SyncLimit = 5
# The directory where the snapshot is stored.
# Do not use/tmp for storage,/tmp here is just
# Example sakes.
DataDir =/var/lib/zookeeper
# The port at which the clients will connect
ClientPort = 2181
#
# Be sure to read the maintenance section of
# Administrator guide before turning on autopurge.
#
# Http://zookeeper.apache.org/doc/current/zookeeperAdmin.html# SC _maintenance
#
# The number of snapshots to retain in dataDir
# Autopurge. snapRetainCount = 3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
# Autopurge. purgeInterval = 1

Server 1 = 192.168.56.1: 2888: 3888
Server .2 = 192.168.56.102: 2888: 3888
Server.3 = 192.168.56.103: 2888: 3888
Server.4 = 192.168.56.104: 2888: 3888
Server.5 = 192.168.56.101: 2888: 3888

Note:

TickTime: interval of sending heartbeat, in milliseconds

Initlimit and sysncLimit: both are measured based on the total number of ticktime (the above time is 10*2000 = 20 s ). The initLimit parameter sets the time for all followers to connect and synchronize with the leader. If more than half of the followers fail to complete synchronization within the set time period, the leader will announce that the leader will give up his leadership position, then another leader election will be conducted. If this happens frequently, you can view the records in the log and find that the set value is too small.

The syscLimit parameter sets the time for a follower to synchronize with the leader. If a follower fails to complete synchronization within the set time, it will restart itself, and all clients associated with the follower will connect to another follower.

DataDir: the persistent data in the stored zookeeperk. There are two types of data in zk, one is to disappear after use, the other is to exist permanently, and the log of zk is also saved here.

Server. A = B: C: D: where A is A number, indicating the number of the server. B is the ip address of the server; C Indicates the port on which the server exchanges information with the Leader server in the cluster. D indicates that if the Leader server in the cluster fails, a port is required for re-election, select a new Leader, which is the port used for communication between servers during the election. For the pseudo cluster configuration method, because B is the same, different Zookeeper instance communication port numbers cannot be the same, so you need to assign them different port numbers.

 

Create a myid file in the data directory of each server. The file content is the id in the above corresponding server. id:

Echo id>/var/lib/zookeeper/myid

 

3.3 start and stop the Zookeeper Service

Start Zookeeper: zkServer. sh start on all nodes in the Cluster

[Root @ rango ~] # ZkServer. sh start
JMX enabled by default
Using config:/usr/zookeeper/bin/../conf/zoo. cfg
Starting zookeeper... STARTED

View: zkserver. sh starus:

[Root @ rango ~] # ZkServer. sh status
JMX enabled by default
Using config:/usr/zookeeper/bin/../conf/zoo. cfg
Mode: follower

Ps: Disable iptables (Intranet) before starting)

ZooKeeper details: click here
ZooKeeper: click here

ZooKeeper cluster configuration

Use ZooKeeper to implement distributed shared locks

Distributed service framework ZooKeeper-manage data in a distributed environment

Build a ZooKeeper Cluster Environment

Test Environment configuration of ZooKeeper server cluster

ZooKeeper cluster Installation

  • 1
  • 2
  • 3
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.