Zookeeper usage and principles (1)

Last Update:2014-10-22 Source: Internet

Author: User

Tags zookeeper download

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ZookeeperIntroduction
Zookeeper is a software that provides consistency services for distributed applications. It is a sub-project in the open-source hadoop project, according to the <The chubby lock service for loosely-coupled distributed systems> paper published by Google, we will first install and use this software, then we will explore some important consistency algorithms.

Install and use zookeeper
The installation of zookeeper can basically follow the steps on the page of http://hadoop.apache.org/zookeeper/docs/current/ zookeeperstarted.html to complete the installation, here mainly introduces the next steps to deploy a cluster, this official page does not seem to be very detailed (running replicated zookeeper ).

Due to the shortage of machines on hand, three servers are deployed on one machine. If you have a tight schedule, you can do the same. Then I created three folders, as shown below:
Server1 server2 server3

Then, extract a zookeeper download package from each folder and create Several folders. The overall structure is as follows. The last one is the decompressed file downloaded from the compressed package.
Data datalog logs zookeeper-3.3.2

First, go to the data directory and create a myid file, which is written with a number. For example, if my file is server1, write a 1 file, and then write 2 to the myid file corresponding to server2, server3 corresponds to the myid file and writes 3

Then enterZookeeper-3.3.2/ConfDirectory. If the file is just downloaded, there will be three files,Configuration. XML, log4j. properties, zoo_sample.cfgThe first thing we need to do is create a configuration file named zoo. cfg in this directory. Of course, you can change the zoo_sample.cfg file to zoo. cfg. The configuration content is as follows:
Ticktime = 2000
Initlimit = 5
Synclimit = 2
Datadir = xxxx/zookeeper/server1/Data
Datalogdir = xxx/zookeeper/server1/datalog
Clientport = 2181
Server .1 = 127.0.0.1: 2888: 3888
Server.2 = 127.0.0.1: 2889: 3889
Server.3 = MAID: 2890: 3890

Several configurations highlighted in red should be clearly stated on the official website, but note that the clientport port is used if you deploy multiple servers on one machine, therefore, different clientports are required for each machine. For example, if server1 is 2181, server2 is 2182, and server3 is 2183, datadir and datalogdir must be differentiated.

The only thing to note in the last few lines is that the number server. x corresponds to the number in data/myid. You have written 1, 2, 3 to the myid files of the three servers respectively. Therefore, zoo. cfg in each server is configured with server.1, server.2, and server.3. Because on the same machine, the two ports connected to the backend and the three servers should not be the same; otherwise, the port conflict occurs. The first port is used for information exchange between cluster members, the second port is used to elect a leader when the leader fails.

In the zookeeper-3.3.2/bin directory,./Zkserver. Sh startWhen a server is started, will a large number of errors be reported? In fact, it does not matter, because now the cluster only has one server, and the zookeeper server is up according to zoo. the CFG Server LIST initiates a leader election request. If an error is reported because the server cannot connect to other machines, the leader will be selected when the second zookeeper instance is started, the consistency service can start to be used. This is because the leader can be selected and provided for external services (2n + 1 Machine) as long as three machines are available, can hold n machines down ).

Next we can use it. We can first use the client interaction program that comes with zookeeper to simply feel what zookeeper is doing. Go to the zookeeper-3.3.2/bin (either of the three servers,./Zkcli. Sh-server 127.0.0.1: 2182I am connected to a machine with port 2182.

First, we can run any command, because zookeeper does not know it, and he will give the help of the command, as shown in figure

Ls (view the current node data ),
Ls2 (view the data of the current node and view the data such as the number of updates ),
Create (create a node ),
Get (get a node, including data and updates ),
Set (modify node)
Delete (delete a node)

Through the above command practice, we can find that zookeeper uses a tree structure similar to a file system. data can be mounted to a node and can be deleted and modified. In addition, we also found that when a node is changed, the active machines in the cluster will update the consistent data.

ZookeeperData Model
After using zookeeper, we found that its data model is similar to the file structure of the operating system, as shown in figure

(1) Each node is called znode in zookeeper and has a unique path ID. For example, the/server2 node ID is/app3/server2.
(2) znode can have sub-znodes and znode can store data, but ephemeral nodes cannot have sub-nodes.
(3) The data in znode can have multiple versions. For example, if a path contains multiple data versions, the data in the query path must contain versions.
(4) znode can be a temporary node. Once the client that creates this znode loses contact with the server, this znode will also be automatically deleted. The client of zookeeper communicates with the server through a persistent connection, each client and server are connected by heartbeat. The connection status is called session. If znode is a temporary node and the session becomes invalid, znode is deleted.
(5) The Directory Name of znode can be automatically numbered. If app1 already exists and is created, it will be automatically named app2.
(6) znode can be monitored, including the modification of the data stored in this directory node and the change of the sub-node directory. Once changed, the monitoring client can be notified, this feature is the most important feature of zookeeper for applications. The features that can be achieved through this feature include centralized configuration management, cluster management, distributed locks, and so on.

Use zookeeper using Java code
Zookeeper is mainly used to create zookeeper instances under its jar package and call its interface methods. The main operation is to add, delete, modify, and listen for znode changes and processing.

The main API usage and explanation are as follows:

// Create a zookeeper instance. The first parameter is the target server address and port, and the second parameter is the Session Timeout time, third, the callback method zookeeper zk = new Zookeeper ("127.0.0.1: 2181", 500000, New watcher () {// monitor all triggered events public void process (watchedevent event) {// dosomething}); // create a node root, whose data is mydata, do not perform ACL permission control, and the node is permanent (that is, the client will not disappear after Shutdown) ZK. create ("/root", "mydata ". getbytes (), IDs. open_acl_unsafe, createmode. persistent); // create a childone znode under the root account. The data is childone and ACL permission control is not performed. The node is a permanent zk. create ("/root/childone", "childone ". getbytes (), IDs. open_acl_unsafe, createmode. persistent); // get the name of the child node under the/root node, and return list <string> zk. getchildren ("/root", true); // get data under the/root/childone node, and return byte [] zk. getdata ("/root/childone", true, null); // modify the data under Node/root/childone. The third parameter is version. If it is-1, this will ignore the modified data version and directly change zk. setdata ("/root/childone", "childonemodify ". getbytes (),-1); // Delete the node/root/childone. The second parameter is version. If it is-1, the node is deleted directly, regardless of version zk. delete ("/root/childone",-1); // disable sessionzk. close ();

Implementation of mainstream zookeeper application scenarios (excluding official examples)

(1)Configuration Management
Centralized configuration management is very common in Application Clusters. Generally, commercial companies implement a centralized configuration management center to meet the needs of different application clusters for sharing their respective configurations, you can also notify every machine in the cluster when the configuration changes.

Zookeeper can easily implement this centralized configuration management. For example, if you configure all the configurations of app1 under/app1 znode, all the machines in app1 will monitor the node/app1 (zk. exist ("/app1", true), and implements the callback method watcher. When the data under/app1 znode on zookeeper changes, each machine will receive a notification, the Watcher method will be executed, and then the application will remove the data (zk. getdata ("/app1", false, null ));

In the preceding example, the coarse-grained configuration of monitoring is simple, and the fine-grained data can be monitored hierarchically. All of this can be designed and controlled.
(2) Cluster Management
In an application cluster, we often need to let every machine know which machines in the cluster (or other clusters dependent on) are alive, and the cluster machines are down due, network leeching and other reasons can be quickly notified to every machine without human intervention.

Zookeeper is also easy to implement. For example, I have a znode named/app1servers on the zookeeper server, when each machine in the cluster is started, a node of the ephemeral type will be created under this node, for example, server1 creation/app1servers/server1 (IP addresses can be used to ensure no duplication ), create/app1servers/server2 in server2, and watch/app1servers in both server1 and server2, that is, if the data or child node changes under the parent node, the client that watches the node will be notified. Because the ephemeral type node has a very important feature, that is, when the client and server are disconnected or the session expires, the node disappears, when a machine is down or disconnected, the corresponding node will disappear, and all the clients in the cluster that watch/app1servers will receive a notification and obtain the latest list.

Another application scenario is to select a master for the cluster. Once the master fails, a master can be selected from the slave immediately. The implementation steps are the same as those of the former, only when the machine is started, the node type created in app1servers changes to the ephemeral_sequential type, so that each node is automatically numbered, for example

Zk. create ("/testrootpath/testchildpath1", "1 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral_sequential); zk. create ("/testrootpath/testchildpath2", "2 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral_sequential); zk. create ("/testrootpath/testchildpath3", "3 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral_sequential); // create a sub-directory node zk. create ("/testrootpath/testchildpath4", "4 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral_sequential); system. out. println (zk. getchildren ("/testrootpath", false ));

Print results: [testchildpath000000, testchildpath20000000001, testchildpath40000000003, testchildpath30000000002]

Zk. create ("/testrootpath", "testrootdata ". getbytes (), IDs. open_acl_unsafe, createmode. persistent); // create a sub-directory node zk. create ("/testrootpath/testchildpath1", "1 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral); zk. create ("/testrootpath/testchildpath2", "2 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral); zk. create ("/testrootpath/testchildpath3", "3 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral); // create a sub-directory node zk. create ("/testrootpath/testchildpath4", "4 ". getbytes (), IDs. open_acl_unsafe, createmode. ephemeral); system. out. println (zk. getchildren ("/testrootpath", false ));

Print the result: [testchildpath2, testchildpath1, testchildpath4, testchildpath3]

By default, the minimum number is set to master. Therefore, when we monitor the/app1servers node, we can obtain the Server LIST. As long as the logic of all cluster machines considers the node with the minimum number as master, the master node will be elected, and the corresponding znode will disappear when the master node goes down, and the new server list will be pushed to the client, then, each node logic considers the smallest number node as the master node, so as to achieve dynamic master election.

Summary

We initially used zookeeper and tried to describe the specific implementation ideas of several application scenarios. In the following article, we will try to explore the high availability and leaderelection algorithms of zookeeper.

Turn: http://www.blogjava.net/BucketLi/archive/2010/12/21/341268.html

Zookeeper usage and principles (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More