Statement
- This article is based on CentOS 6.x + CDH 5.x
Zookeeper what to use to see the previous tutorial, you will find multiple occurrences of zookeeper, such as the auto failover Hadoop zookeeper, Hbase Regionserver also have to use zookeeper. In fact, more than Hadoop, including the now small and famous Storm with the zookeeper. So what exactly is zookeeper for?
- Zookeeper is a distributed, open-source distributed Application coordination Service.
- Zookeeper's goal is to encapsulate complex and error-prone services that provide users with easy-to-use interfaces and performance-efficient, stable systems
- The zookeeper contains a simple set of primitives that provide both Java and C interfaces.
In zookeeper, Znode is a node similar to a UNIX file system path that can store or fetch data to that node. If flag is set to ephemeral when the Znode is created, the Znode will no longer exist in zookeeper when the node that created the Znode and zookeeper loses connectivity, zookeeper uses watcher to detect event information. When the client receives event information, such as a connection timeout, node data changes, and child node changes, the corresponding behavior can be invoked to process the data. Zookeeper's wiki page shows how to use zookeeper to handle event notifications, queues, priority queues, locks, shared locks, revocable shared locks, and two-phase commits.
Simply put: By registering the Znode node in the zookeeper, we can automatically monitor whether these nodes are alive, without having to implement them manually, so zookeeper solves the user's wish to know if the node is alive? "The question
Quick Start Installation
Server, installed on each machine that needs to be monitored
Yum Install Zookeeper-server
Client
Yum Install Zookeeper-client
Config edit/etc/zookeeper/conf/zoo.cfg, zoo.cfg on all machines this file is the same.
maxclientcnxns=50# the number of milliseconds of each tickticktime=2000# the number of ticks the initial # Synchroniz ation phase can takeinitlimit=10# the number of ticks so can pass between # Sending a request and getting an acknowledge mentsynclimit=5# the directory where the snapshot is stored.datadir=/var/lib/zookeeper# the port at which the clients would connectclientport=2181server.1=host1:2888:3888server.2=host2:2888:3888
- The Ticktime unit is the millisecond, which is used as the minimum unit for calculating the session time-out and heartbeat time, which is expressed as a multiple of ticktime instead of writing the number of milliseconds directly, but at least the user is a bit simpler, just write a multiplier.
- DataDir a snapshot of storage memory data or an update log about a transaction
- ClientPort port used by the client
Use to start zookeeper on all machines
Service Zookeeper-server Start
Using Zookeeper-client to enter the client
Use the help command to see what commands are available
[zk:localhost:2181 (CONNECTED) 0] helpzookeeper-server host:port cmd argsconnect host:portget path [Watch]ls path [watch] Set path data [Version]rmr Pathdelquota [-n|-b] pathquit printwatches on|offcreate [-S] [-e] Path data aclstat path [watch ]close LS2 Path [watch]history listquota pathsetacl path aclgetacl pathsync pathredo cmdnoaddauth scheme authdelete path [ Version]setquota-n|-b Val Path
Let's try the simplest ls command.
[Zk:localhost:2181 (CONNECTED) 1] LS/[hadoop-ha, hbase, zookeeper][zk:localhost:2181 (CONNECTED) 2] ls/hadoop-ha[ Mycluster]
You can see that there are three nodes below, and there is a node below the hadoop-ha. This is because my previous tutorials have already installed Hadoop and hbase so there will be these nodes. To do here, we understand that zookeeper maintains a space similar to the folder structure in which the things stored in this space are znode,znode can have their own child nodes
Create a Node
Next, let's try to create a new node
[Zk:localhost:2181 (CONNECTED) 3] create/zk_test my_datacreated/zk_test[zk:localhost:2181 (CONNECTED) 4] LS/[ Hadoop-ha, HBase, zookeeper, Zk_test]
This zk_test is the node name we want to build, and My_data is the node's data. We can use the GET command to see the node's data.
Node data
[Zk:localhost:2181 (CONNECTED) 5] Get/zk_testmy_dataczxid = 0x2200000019ctime = Sun Jan 02:30:56 PST 2015mZxid = 0x220 0000019mtime = Sun Jan 02:30:56 PST 2015pZxid = 0x2200000019cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0 x0datalength = 7numChildren = 0
Before explaining these parameters, let's introduce a concept: Zookeeper track time in a variety of ways
- ZXID: Each time the zookeeper state is modified, it receives a zxid-form timestamp, which is the Zookeeper transaction ID. The transaction ID is the total order of all modifications in the zookeeper. Each modification has a unique zxid, and if ZXID1 is less than Zxid2, then zxid1 occurs before Zxid2.
- Version number: Each modification of the node will cause the node's version number to be incremented by one. There are three versions: version (the number of times the Znode data was modified), cversion (the number of times Znode child nodes were modified), and aversion (the number of ACL modifications Znode).
- TICK: In multi-server zookeeper, the server uses tick to define the timing of events such as State uploads, session timeouts, Inter-node connection timeouts, and so on. The tick is only used indirectly by the minimum session timeout (twice times the Tick time): If the client requires less time than the minimum session timeout, the server informs the client that it is actually using the minimum session timeout.
- Real time: In addition to placing timestamps in the stat struct when creating and modifying Znode, zookeeper does not use real time, or clock time.
The parameters obtained by Get are the meaning of:
- Czxid: Zxid of the transaction that created the node
- MZXID: Zxid recently modified for Znode
- CTime: Znode creation time in milliseconds from time origin (epoch)
- Mtime: Znode Last modified time in milliseconds from time origin (epoch)
- PZXID: The last version of the child node
- Cversion:znode number of child node modifications
- Dataversion: Version of the data
- Aclversion:znode Number of ACL modifications
- Ephemeralowner: Indicates the session ID of the node owner if Znode is a temporary node, or zero if it is not a temporary node.
- Datalength:znode data length.
- Numchildren:znode number of child nodes.
Modify Node
Do not need to fully understand the above parameters, we can try to modify the data and then see
[Zk:localhost:2181 (CONNECTED) 6] set/zk_test Junkczxid = 0x2200000019ctime = Sun Jan 02:30:56 PST 2015mZxid = 0x22000 0001amtime = Sun Jan 02:55:35 PST 2015pZxid = 0x2200000019cversion = 0dataVersion = 1aclVersion = 0ephemeralOwner = 0x0 datalength = 4numChildren = 0[zk:localhost:2181 (CONNECTED) 7] Get/zk_testjunkczxid = 0x2200000019ctime = Sun Jan 18 02:3 0:56 PST 2015mZxid = 0x220000001amtime = Sun Jan 02:55:35 PST 2015pZxid = 0x2200000019cversion = 0dataVersion = 1aclVer sion = 0ephemeralOwner = 0x0datalength = 4numChildren = 0
Comparing the previous parameters, we can see that the following parameters have changed
- Mzxid: Changes to another data representing the last modified version
- Mtime: Last Modified time changed
- Dataversion: from 0 to 1, which means that the version of the data has increased by 1
- DATALENGTH: Data length has changed
What is ACL zookeeper uses ACLs to control access to nodes. The implementation of ACLs is very similar to UNIX file access: A permission bit is used to define the various node operations allowed/disallowed, as well as the scope of the bit application. Unlike standard UNIX permissions, the Zookeeper node is not limited by the user (file owner), group, and other three standard scopes. Zookeeper does not have the concept of a node owner. Instead, the ACL specifies a collection of IDs and the permissions associated with those IDs. Also note that the ACL is used only for a particular node. In particular, ACLs are not applied to child nodes. For example,/app can only be read by ip:172.16.16.1,/app/status is read by all users. ACLs are not recursive. The zookeeper supports pluggable authentication modes. Specify the ID in the form of Scheme:id, where scheme is the authentication mode for the ID. For example, ip:172.16.16.1 is the ID of the host with the address 172.16.16.1. When the client connects to zookeeper and authenticates itself, zookeeper associates the ID of all corresponding clients to the client connection. When the client attempts to access the node, zookeeper will test the IDs on the ACL of the node. The ACL consists of (scheme:expression,perms) pairs. The format of expression is scheme-specific. For example, (ip:19.22.0.0/16,read) give any IP address that starts with 19.22 client with READ permission.
Just need to know the concept on the line, specific use and so on when needed to learn.
Delete a node
[Zk:localhost:2181 (CONNECTED) 8] delete/zk_test[zk:localhost:2181 (CONNECTED) 9] LS/[hadoop-ha, hbase, zookeeper]
You can see that the node is deleted.
In fact, this tutorial just lets you know how zookeeper manages nodes, but it doesn't explain how zookeeper listens to nodes and tag nodes, because those are a bit complicated and don't really work for our average developers, which are typically Hadoop or storm Developers need to consider the issue. We as the end user only need to zookeeper is what kind of thing, what is long what looks like has a perceptual understanding to be able.
Resources
- Http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html
- Http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#sc_zkDataModel_znodes
- http://blog.163.com/wm_at163/blog/static/132173490201232423051163/
Alex's Novice Hadoop Tutorial: Lesson 9th Zookeeper Introduction and use