Literally, zookeeper represents a zoo keeper, which is a fascinating name, and we think of the Hadoop ecosystem, where many of the projects ' logos are used by animals, such as the image of the elephant in Hadoop, So we can guess that Zookeep is doing some management work on these animals.
I. Zookeeper BASIC INTRODUCTION 1.1 The zoo also needs to be safe.
Zookeeper is a sub-project under Hadoop that coordinates some of the distributed frameworks associated with Hadoop, such as Hadoop, Hive, pig, and so on, in fact they are animals, so called zookeeper--" Zoo keeper ". There are certainly a lot of animals in the zoo, visitors can see the various types of animals according to the guide map provided by the zoo, rather than the passive objects of stricken in the primitive jungle. In order for the various animals to stay where they should be, instead of stopping by each other, or fighting each other, the zoo keeper needs to classify and manage the animals according to their various habits, so that we can be more assured of safe viewing of animals.
1.2 In-process coordination methods
In practical applications,zookeeper is mainly aimed at high-reliability coordination for large-scale distributed systems . By this definition, we know that zookeeper is a coordination system , and the object is Distributed System . Speaking of coordination, we can think of the real life of many intersections of traffic wardens, they hold a small red flag, directing vehicles and pedestrians are not allowed to pass. If we compare cars and pedestrians to units (threads) running on a computer, what does this wardens do? Many people will think, this is not a lock it? Yes, in a concurrent environment, in order to avoid the simultaneous modification of shared data by multiple operating units, and the occurrence of data corruption, we have to rely on coordination mechanisms like locks, so that some threads can manipulate these resources first, and then other threads wait. For in-process locks, the various language platforms we use have given us a variety of options. For example, in C #, the most common use is to construct a synchronization block with the help of syntax sugar Lock:
intWithdraw (intamount) { if(Balance <0) { Throw NewException ("Negative Balance"); } Lock(thislock) {if(Balance >=amount) {Console.WriteLine ("Balance before Withdrawal:"+balance); Console.WriteLine ("Amount to withdraw:-"+amount); Balance= Balance-amount; Console.WriteLine ("Balance after Withdrawal:"+balance); returnamount; } Else { return 0; } } }
1.3 Coordination in a distributed environment
Coordination within the process we can use the language, platform, operating system and other mechanisms provided for us. So what if we're in a distributed environment? That is, our programs run on different machines, which may be in the same rack, in the same room, or in different data centers. In such an environment, what should we do to achieve coordination? So this is what the distributed coordination Service is going to do.
As a result, Google created chubby, and zookeeper is an open source implementation for chubby.
Definition: Zookeeper is a highly available, high-performance, consistent open source coordination service designed for distributed applications that provides a basic service: Distributed lock Service . Due to zookeeper's open source features, our developers later explored other ways of using distributed locks: configuration maintenance, group services, distributed Message Queuing , distributed notification/coordination , and so on.
1.4 Zookeeper Application Scenarios
(1) Unified Naming Service
There is a group of servers that provide a service to the client (for example, a Web site cluster built with LVS technology, a cluster of n servers, a Web service for users), and we want clients to find a server in a service-side cluster every time the client requests the service. This allows the server to provide the client with the services required by the client. For this scenario, there must be a list of these servers in our program, and each time the client requests it, it reads the list of servers from this table. Then this table clearly cannot be stored on a single node of the server, otherwise this node hangs, the entire cluster will fail, we hope this list is highly available. The highly available solution is that the list is distributed storage, which is managed by the server that stores the list, and if one of the servers in the table is broken, the other server can immediately replace the broken server, and the broken server can be removed from the table, leaving the failed server out of the cluster , and all of this is not done by the failed server, but the normal server in the cluster. This is an active distributed data structure that can proactively modify the state of data items when external conditions change. This service is provided by the zookeeper framework. The service name is the unified naming service, which resembles the Jndi service in Java EE.
(2) Distributed lock Service
When the distributed system operates data, for example: reading data, analyzing data, and finally modifying data. In the distributed system, these operations may be dispersed to different nodes in the cluster, then there is the problem of consistency in the data operation process, if not consistent, we will get a wrong result, in a single process program, the consistency of the problem is very good solution, but to the distributed system is more difficult, Because the operation of the different servers in the distributed system is in the independent process, the intermediate result and the process of the operation also pass through the network, so it is more difficult to achieve the consistency of data operation. Zookeeper provides a lock service that solves this problem and allows us to ensure the consistency of data operations when doing distributed data operations.
(3) Configuration management
In the distributed system, we will deploy a service application to n servers, the configuration files are the same (for example: I designed the distributed site framework, the server has 4 servers, 4 servers are the same, the configuration files are the same), If configuration options change, then we have to change each of these configuration files, if we need to change the number of servers less, these operations are not too cumbersome, if we have more distributed servers, such as some large internet companies Hadoop cluster has thousands of servers, Changing configuration options is a cumbersome and dangerous thing to do. This time zookeeper can come in handy, we can use zookeeper as a high-availability configuration memory, to the zookeeper to manage such things, we copy the cluster configuration file to a node of the zookeeper file system, Then use zookeeper to monitor the status of the configuration files in all distributed systems, once the configuration files have been found to change, each server will receive zookeeper notifications, each server to synchronize the zookeeper configuration files, The Zookeeper service also guarantees the atomicity of the synchronization operation, ensuring that each server's configuration file is updated correctly.
As can be seen, zookeeper is a typical application of the Observer pattern .
(4) Cluster management
Cluster management is very difficult, in the distributed system to join the zookeeper service, can make it easy for us to manage the cluster. Cluster management The most troublesome thing is node fault management , zookeeper can let the cluster choose a healthy node as the Master,master node will know the current cluster of each server health, once a node fails, Master will notify the other servers in the cluster to redistribute the compute tasks for the different nodes. Zookeeper not only can find fault, but also the fault of the server screening, see what the fault server is the fault, if the fault can be repaired, zookeeper can automatically repair or tell the system administrator the cause of the error to let the administrator quickly locate the problem, repair the fault of the node. We may have a question, master fault, then how to do? Zookeeper also considered this, zookeeper has an "election leader algorithm", master can be dynamically selected, when master failure, zookeeper can immediately select a new master to manage the cluster.
PS: about the master election, you can browse suddenly this article: Http://www.cnblogs.com/sunddenly/p/4033574.html, whose article is part of the distributed lock application scenario, There is a detailed introduction to the master election.
Second, zookeeper cluster mode environment to build 2.1 Zookeeper cluster mode typical architecture
(1) The typical architecture diagram is as follows:
(2) The structure of this test is as follows:
2.2 Zookeeper cluster mode build step
Note: Zookeeper server cluster size is not less than 3 nodes, requires the system time between the servers to be consistent;
(1) Upload zookeeper installation package via FTP tool, I use 3.4.5 version:
: Http://pan.baidu.com/s/1qWyoFhU
(2) Unzip the zookeeper installation package and change the extracted folder name to zookeeper:
①TAR-ZVXF zookeeper-3.4.5.tar.gz
②MV zookeeper-3.4.5 Zookeeper
(3) Modify environment variables:vim/etc/profile
① Add a line:export Zookeeper_home=/usr/local/zookeeper
② Modify PATH:export path=.: $HADOOP _home/bin: $ZOOKEEPER _home/bin: $JAVA _home/bin: $PATH
③ make configuration effective:source/etc/profile
(4) Enter the Zookeeper Conf directory, modify the file name:mv zoo_sample.cfg zoo.cfg
(5) Editor Zoo.cfg:vim zoo.cfg
① Modifying Datadir=/usr/local/zookeeper/data
② New server.0=hadoop-master:2888:3888
server.1=hadoop-slave1:2888:3888
server.2=hadoop-slave2:2888:3888
(6) Create the Data folder and create the myID file:
① New Data folder:mkdir/usr/local/zookeeper/data
② New myID file:vim myID, and set the first server to 0.
(7) Copy the Zookeeper directory to the remaining two servers:
①scp/usr/local/zookeeper hadoop-slave1:/usr/local/
② scp/usr/local/zookeeper hadoop-slave2:/usr/local/
(8) Copy the environment variable configuration file to the remaining two servers:
①scp/etc/profile hadoop-slave1:/etc
② SCP /etc/profile hadoop-slave2:/etc
(9) Modify the myID file in the remaining two servers: set to 1 and 2;
(10) Start zookeeper, execute the command in three nodes respectively: zkserver.sh start
(11) Verify the Zookeeper cluster node role status and execute the command in three nodes respectively: zkserver.sh status
Role: The following roles are included in the zookeeper:
① leader (leader), responsible for initiating and resolution of voting, updating system status,
② Learners (learner), including followers (follower) and observers (Observer), Follower is used to accept client requests and want the client to return results, participate in voting during the main process, observer can accept client connections, send write requests to leader, but observer not participate in the voting process, only synchronize leader state, The purpose of observer is to extend the system and improve the reading speed.
Three, zookeeper simple test
After setting up the cluster environment, we can perform a simple read-write conformance test, here we go through the zkcli.sh in the bin directory of zookeeper to complete the following operations:
(1) Perform a write operation on one of the node 192.168.80.100:create/mytest test
(2) Perform a read operation on the other two nodes:get/mytest
TIP: You can log in remotely via Zkcli.sh-server hadoop-slave1:2181 in a single node
(3) Perform a modify operation on one of the node 192.168.80.101:
(4) Perform a read operation on the other two nodes:
Resources
(1) Zhang Shanyu, "Zookeeper Distributed Lock Service": http://www.cnblogs.com/shanyou/archive/2012/09/22/2697818.html
(2) Summer forest, "distributed website architecture follow-up: Zookeeper Technology": http://www.cnblogs.com/sharpxiajun/archive/2013/06/02/3113923.html
(3) Horizontal knife Day smile, "What is Zookeeper-zookeeper?" ": http://www.cnblogs.com/yuyijq/p/3391945.html
(4) Suddenly, "Hadoop log day20-zookeeper": http://www.cnblogs.com/sunddenly/p/4033574.html
Zhou Xurong
Source: http://www.cnblogs.com/edisonchou/
The copyright of this article is owned by the author and the blog Park, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to give the original link.
Hadoop Learning note -14.zookeeper Environment building