Statement, this article is to copy someone else's article, feel OK, original: http://www.cnblogs.com/xymqx/p/4465610.html
I love technology, kneeling to have a good technical article hope that we share a lot, thank you ....
Zookeeper Distributed Service Framework is a sub-project of Apache Hadoop, which is mainly used to solve some data management problems commonly encountered in distributed applications, such as: Unified Naming Service, State Synchronization service, cluster management, management of distributed application configuration items, etc. This article will detail the meaning of each configuration item in Zookeeper installation and configuration files from the consumer perspective, as well as the typical scenarios for analyzing Zookeeper (configuration file management, cluster management, Sync Lock, Leader election, queue management, etc.), using Java Implement them and show the sample code.
1 This article reading: 2 installation and configuration details 3 How to use 4 Zookeeper typical application Scenario 5 summary
Detailed installation and configuration
This article describes the Zookeeper is based on the stable version of 3.2.2, the latest version can be obtained through the official website http://hadoop.apache.org/zookeeper/, Zookeeper installation is very simple, The installation and configuration of Zookeeper are described in two ways, from stand-alone mode and cluster mode.
Stand-alone mode
Single-machine installation is very simple, as long as the Zookeeper to obtain a compressed package and extract to a directory such as:/home/zookeeper-3.2.2, Zookeeper startup script in the Bin directory, the startup script under Linux is zkserver.sh< /c1>, in 3.2.2 This version Zookeeper does not provide a startup script under Windows, so you want to start Zookeeper under Windows to write one yourself, as shown in Listing 1:
1 Listing 1. Windows Zookeeper Startup script 2 3 setlocal 4 set zoocfgdir=%~dp0%: \conf 5 set zoo_log_dir=%~dp0% . 6 Set Zoo_log4j_prop=info,console 7 set classpath=%zoocfgdir% 8 9 Set Classpath=%~dp0 .. \*;%~dp0. \lib\*;%classpath% set classpath=%~dp0. \build\classes;%~dp0. \build\lib\*;%classpath% one set zoocfg=%zoocfgdir%\zoo.cfg set zoomain= Org.apache.zookeeper.server.ZooKeeperServerMain java "-dzookeeper.log.dir=%zoo_log_dir%" "- dzookeeper.root.logger=%zoo_log4j_prop% " -cp"%classpath% "%zoomain%"%zoocfg% "%*
Before you execute the startup script, there are a few basic configuration items that need to be configured, Zookeeper configuration files in the Conf directory, this directory has zoo_sample.cfg and log4j.properties, what you need to do is Zoo_ Sample.cfg renamed Zoo.cfg because Zookeeper will find this file as the default profile at startup. The meanings of each configuration item in this configuration file are described in detail below.
1 ticktime=2000 2 datadir=d:/devtools/zookeeper-3.2.2/build 3 clientport=2181
- ticktime: This time is the time interval between the Zookeeper server or between the client and the server to maintain the heartbeat, that is, every ticktime time a heartbeat is sent.
- DataDir: As the name implies is Zookeeper to save the data directory, by default, Zookeeper will write the data log file is also stored in this directory.
- clientport: This port is the port that the client connects to the Zookeeper server, Zookeeper listens to the port and accepts the client's access request.
When these configuration items are configured, you can start Zookeeper now, after starting to check whether Zookeeper is already in service, you can check with the Netstat–ano command to see if there is a ClientPort port number you configured in the Listening service.
Cluster mode
Zookeeper can not only provide services, but also support multi-machine cluster to provide services. In fact, Zookeeper also supports another way of pseudo-clustering, that is, you can run multiple Zookeeper instances on a single physical machine, and the following describes the installation and configuration of cluster mode.
The installation and configuration of the Zookeeper cluster mode is also not very complex, all you have to do is add a few configuration items. The cluster mode adds the following configuration items in addition to the above three configuration items:
1 initlimit=5 2 synclimit=2 3 server.1=192.168.211.1:2888:3888 4 server.2=192.168.211.2:2888:3888
- initlimit: This configuration item is used to configure the Zookeeper accept client (the client here is not the client that connects the Zookeeper server, but the Leader that is connected to Follower in the Zookeeper server cluster) Server) The maximum number of heartbeat intervals that can be tolerated when a connection is initialized. The client connection failed when the Zookeeper server has not received the return information of the client after 10 heartbeats (that is, ticktime) length. The total length of time is 5*2000=10 seconds.
- synclimit: This configuration item identifies the length of time that a message, request and response is sent between Leader and Follower, the maximum number of ticktime, and the total length of time is 2*2000=4 seconds
- server. A=b:c:d: Where A is a number, indicating this is the first server, B is the IP address of this server, C is the server and the Leader server in the cluster to exchange information on the port; D means that in case the Leader server in the cluster is hung up, a port is required to re- A new election to elect a new Leader, and this port is used to perform the election when the server communicates with each other port. If it is a pseudo-cluster configuration, because B is the same, so different Zookeeper instance communication port numbers can not be the same, so they should be assigned a different port number.
In addition to modifying the Zoo.cfg configuration file, in the cluster mode to configure a file myID, the file in the DataDir directory, the file contains a data is a value, Zookeeper startup will read this file, get the data inside and zoo.cfg The configuration information is compared to determine the server.
Data model
Zookeeper maintains a hierarchical relational data structure that is very similar to a standard file system, as shown in 1:
Figure 1 Zookeeper data structure
Zookeeper This data structure has the following features:
- Each subdirectory entry, such as Nameservice, is called Znode, and the Znode is uniquely identified by the path it is located in, such as Server1, which is the Znode identity/nameservice/server1
- Znode can have child node directories, and each znode can store data, note that ephemeral types of directory nodes cannot have child node directories
- Znode is a version of the data stored in each Znode can have multiple versions, that is, one access path can store multiple copies of data
- Znode can be a temporary node, once created this Znode client and the server lost contact, this znode will also be automatically deleted, Zookeeper client and server communication with a long connection, each client and server through the heartbeat to maintain connectivity, this connection status is called Session, if Znode is a temporary node, this session expires, Znode also deleted
- Znode directory name can be automatically numbered, such as App1 already exists, and then created, will be automatically named as App2
- Znode can be monitored, including changes in the data stored in this directory node, changes in sub-node directories, and so on, once the change can notify the settings monitoring client, this is the core feature of Zookeeper, Zookeeper Many of the functions are based on this feature is implemented, There will be examples in the following typical application scenarios
How to use
Zookeeper, as a distributed service framework, is mainly used to solve the consistency problem of application system in distributed cluster, it can provide data storage based on directory node tree like file system, but Zookeeper is not used to store data specifically. Its role is primarily to maintain and monitor the changes in the state of the data you store. By monitoring the changes in the status of these data, so that the data-based cluster management can be achieved, the following will be described in detail Zookeeper can solve some of the typical problems, here first introduced, Zookeeper interface and simple use example.
List of common interfaces
The client to connect to the Zookeeper server can be created by Org.apache.zookeeper. an instance object of ZooKeeper, and then invokes the interface provided by this class to interact with the server.
That's what I said. ZooKeeper is primarily used to maintain and monitor the state of data stored in a directory node tree , all of which we can manipulate ZooKeeper as well as the Operations directory node tree, such as creating a directory node, setting up data for a directory node, Gets all the subdirectory nodes of a directory node, sets permissions for a directory node, and monitors the state change of this directory node.
These interfaces are shown in the following table:
Table 1 Org.apache.zookeeper. ZooKeeper Method List
Method Name |
method Function Description |
Stringcreate (String path, byte[] data, list<acl> Acl,createmode createmode) |
Create a given directory node path, and set it to data, Createmode identifies four forms of directory nodes, namely persistent: Persistent directory node, the data stored in this directory node is not lost; Persistent_ Sequential: Sequential auto-numbered directory nodes, which automatically add 1 to the number of nodes that are currently near, and then return to the directory node name that the client has successfully created; ephemeral: temporary directory nodes, once the client and server ports that created the node are Session timeout, this node will be automatically deleted; ephemeral_sequential: Temporary autonumber node |
Statexists (String path, Boolean watch) |
Determine if a path exists and set whether to monitor the directory node, where the watcher is the Watcher,exists method specified when creating the ZooKeeper instance and an overloaded method that can specify a specific watcher |
Statexists (String Path,watcher Watcher) |
Overloaded method, where a specific watcher,watcher is set for a directory node in ZooKeeper is a core function, watcher can monitor the data changes of the directory node and the changes of subdirectories, once these states have changed, The server notifies all watcher that are set on this directory node, so that each client quickly knows that the state of the directory node it is interested in changes and responds accordingly |
void Delete (String path, int version) |
Delete the directory node corresponding to path, version 1 can match any version, and delete all the data of this directory node |
List<string>getchildren (String path, Boolean watch) |
Gets all the subdirectory nodes under the specified path, and the same GetChildren method also has an overloaded method to set the state of a specific watcher monitoring child node |
Statsetdata (String path, byte[] data, int version) |
Set the data to path, you can specify the version number of this data, if version is 1 how can match any edition |
Byte[] GetData (String path, Boolean watch, stat stat) |
Gets the data stored by the directory node of this path, the data version and other information can be specified by stat, and can also set whether to monitor the status of this directory node data |
Voidaddauthinfo (String scheme, byte[] auth) |
The client submits its own authorization information to the server, and the server verifies the client's access rights based on this authorization information. |
Statsetacl (String path,list<acl> ACL, int version) |
To reset access to a directory node, it is important to note that the directory node permissions in Zookeeper are not transitive, and the permissions of the parent directory node cannot be passed to the subdirectory node. The directory node ACL consists of two parts:perms and ID. Perms have all, READ, WRITE, CREATE, DELETE, ADMIN several The ID identifies the list of identities that access the directory nodes, and by default there are two types: Anyone_id_unsafe = new ID ("World", "anyone") and auth_ids = new ID ("AUTH", ""), respectively, means that anyone can access and the creator has access. |
List<acl>getacl (String path,stat Stat) |
Get a list of access permissions for a directory node |
In addition to the methods listed in the previous table above, there are several overloaded methods, such as the overloaded method for a callback class and an overloaded method that can set specific watcher, which can be referenced in Org.apache.zookeeper. The API description for the ZooKeeper class.
Basic operations
Here's a sample code for the basic operation ZooKeeper, so you can have a visual understanding of ZooKeeper. The following checklist includes creating a connection to the ZooKeeper server and the most basic data operations:
Listing 2. ZooKeeper Basic Operation Example
1//Create a connection to the server 2 ZooKeeper ZK = new ZooKeeper ("localhost:" + client_port, 3 clientbase.connection_timeout, n EW Watcher () {4//monitor all triggered events 5 public void process (Watchedevent event) {6 Sys Tem.out.println ("has triggered" + event.gettype () + "Event! "); 7} 8}); 9//Create a directory node Zk.create ("/testrootpath", "Testrootdata". GetBytes (), ids.open_acl_unsafe,11 createmode.persistent); 12//Create a sub-directory node zk.create ("/testrootpath/testchildpathone", "Testchilddataone". GetBytes (), Ids.open_acl_unsafe, Createmode.persistent); System.out.println (New String (Zk.getdata ("/testrootpath", False,null)); 16//Remove Sub-directory node List System.out.println (Zk.getchildren ("/testrootpath", true)); 18//Modify sub-directory node Data zk.setdata ("/testrootpath/testchildpathone", "Modifychilddataone". GetBytes (),-1); SYSTEM.OUT.PRINTLN ("Directory node Status: [" +zk.exists ("/testrootpath", True) + "]"); 21//Create another sub-directory node zk.create ("/testrootpath/testchildpathtwo", "TestchilddaTatwo ". GetBytes (), ids.open_acl_unsafe,createmode.persistent); System.out.println (New String (Zk.getdata ("/testrootpath/testchildpathtwo", True,null)); 25//delete sub-directory node Zk.delete ("/testrootpath/testchildpathtwo",-1); Zk.delete ("/testrootpath/testchildpathone",-1); 28//Delete parent directory node Zk.delete ("/testrootpath",-1); 30//Off Connection zk.close ();
The results of the output are as follows:
1 has triggered the None event! 2 testrootdata 3 [Testchildpathone] 4 directory node Status: [5,5,1281804532336,1281804532336,0,1,0,0,12,1,6] 5 has been triggered Nodechildrenchanged Event! 6 Testchilddatatwo 7 has triggered the nodedeleted event! 8 has triggered the nodedeleted event!
The process method of the Watcher object is called when the status of the directory node changes while the monitoring state of the directory node is turned on.
Typical application scenarios for ZooKeeper
Zookeeper from a design pattern perspective, it is a distributed service management framework based on the observer pattern design that stores and manages the data that everyone cares about and then accepts the viewer's registration, and once the status of the data changes, Zookeeper will be responsible for notifying Zookeeper registered on the observers to make corresponding response, so as to achieve similar Master/slave management mode in the cluster, about the detailed architecture of Zookeeper and other internal details can read Zookeeper source code
Here's a detailed description of these typical scenarios, which is how Zookeeper can help us solve those problems? The answer is given below.
Unified Naming Services (name service)
In distributed applications, a complete set of naming conventions is often required, both to produce a unique name and to be easily recognizable and remembered, usually with a tree-shaped name structure as an ideal choice, and the tree-shaped name structure is a hierarchical directory structure that is neither friendly nor repetitive. Speaking of which, you might think of Jndi, yes. Zookeeper's name service is similar to what JNDI can do, and they all associate hierarchical directory structures to certain resources, but Zookeeper's name service is more broadly Association, maybe you don't need to associate a name to a specific resource, you might just need a name that doesn't duplicate, just like a unique numeric primary key in the database.
The Name Service is already a built-in feature of Zookeeper, and you can do so just by invoking the Zookeeper API. It is easy to create a directory node if you call the Create interface.
Configuration management (config Management)
Configuration management is common in distributed application environments, where multiple PC servers are required for the same application system, but some of the configuration items of the applications they run are the same, and if you want to modify these same configuration items, you must modify each PC server that is running the application. This is very troublesome and error prone.
Configuration information like this can be left to Zookeeper to manage, the configuration information is stored in a directory node in Zookeeper, and then all the application machine needs to be modified to monitor the configuration information status, once the configuration information changes, each application machine will receive Zookeeper notification , and then get new configuration information from Zookeeper to the system.
Figure 2. Configuration management structure diagram
Cluster Management (Group membership)
Zookeeper can easily realize the function of cluster management, if there are more than one server to form a service cluster, then a "manager" must know the service status of each machine in the current cluster, and once the machine cannot provide the service, other clusters in the cluster must know to make the adjustment and redistribution service policy. Also, when you increase the service capability of a cluster, you add one or more servers, and you must also let "Explorer" know.
Zookeeper not only helps you maintain the service status of the machines in your current cluster, but also helps you select a "manager" to manage the cluster, which is another function of Zookeeper Leader election.
They are implemented by creating a directory node of type ephemeral on Zookeeper, and then each Server invokes the GetChildren (String path, Boolean Watch) method on the parent directory node on which they create the directory node and sets Watch is true, because it is the ephemeral directory node, when the Server that created it dies, the directory node is deleted, so children will change, and Watch on GetChildren will be called, so other servers will know There is already a server dead. New Server is the same principle.
Zookeeper How to implement Leader election, which is to choose a Master Server. As in the previous one, each server creates a ephemeral directory node, but it is also a sequential directory node, so it is a ephemeral_sequential directory node. The reason it is a ephemeral_sequential directory node is because we can give each server number, we can choose the server that is currently the smallest number is the Master, if this smallest number of the server died, because it is the ephemeral node, The node for the dead Server is also deleted, so a node with the smallest number appears in the current node list, and we select the node as the current Master. In this way, the dynamic selection master is realized, which avoids the problem of single-point failure in the traditional sense of single-master.
Figure 3. Cluster management structure diagram
The sample code for this section is as follows, and the complete code is shown in the attachment:
Listing 3. Leader election key code
1 void Findleader () throws Interruptedexception { 2 byte[] leader = null; 3 try { 4 leader = Zk.getdata (root + "/leader", true, null); 5 } catch (Exception e) { 6 logger.error (e); 7 } 8 if (leader! = NULL) { 9 following (); } else {one String newleader = null; 12
try { byte[] localhost = inetaddress.getlocalhost (). getaddress (); Newleader = zk.create (root + "/ Leader ", localhost, ZooDefs.Ids.OPEN_ACL_UNSAFE, createmode.ephemeral); The catch (Exception e) { logger.error (e); + if (newleader! = null) { leading (); 21< c27/>} else { mutex.wait ();
Shared Lock (Locks)
Shared locks are easily implemented in the same process, but not across processes or between different servers. Zookeeper it is easy to implement this function, the implementation is also required to obtain a lock Server to create a ephemeral_sequential directory node, and then call GetChildren method gets the smallest directory node in the current directory node list is not the directory node that it created itself, if it is created by itself, then it acquires this lock, if not then it calls exists (String path, Boolean watch) Method and monitor the changes in the list of directory nodes on Zookeeper, until the node that you create is the smallest directory node in the list, so that the lock is easy to release, as long as you delete the directory node that you created earlier.
Figure 4. Zookeeper implementation of Locks flowchart
The implementation code of the synchronization lock is as follows: The complete code is shown in the attachment.
Listing 4. Key code for Sync lock
1 void Getlock () throws Keeperexception, interruptedexception{ 2 list<string> List = Zk.getchildren (root , false); 3 string[] nodes = List.toarray (new string[list.size ()); 4 Arrays.sort (nodes); 5 if (Myznode.equals (root+ "/" +nodes[0])) { 6 doAction (); 7 } 8 else{ 9 waitforlock (Nodes[0]), ten} One } , void Waitforlock (String Lower) throws Interruptedexception, keeperexception { stat stat = zk.exists (root + "/" + lower,true), if (St At! = null) { mutex.wait (); + else{- getlock ();
Queue Management
Zookeeper can handle two types of queues:
-
- This queue is available when a member of a queue is NAND, otherwise it waits for all members to arrive, which is the synchronization queue.
- Queues are queued and out-of-line in a FIFO manner, for example to implement producer and consumer models.
The implementation of the synchronization queue with Zookeeper is as follows:
Create a parent directory/synchronizing, each member of the monitor flag (Set Watch) bit directory/synchronizing/start exists, and then each member joins the queue, the way to join the queue is to create/synchronizing/ Member_i The temporary directory node, and then each member gets/synchronizing all directory nodes of the directory, that is, Member_i. Determine if the value of I is already the number of members, and if it is less than the number of members waiting for/synchronizing/start to appear, create/synchronizing/start if it is already equal.
It is easier to understand with the following flowchart:
Figure 5. Synchronization Queue Flowchart
The key code for the synchronization queue is as follows, and the complete code is shown in the attachment:
Listing 5. Synchronization queue
1 void Addqueue () throws Keeperexception, interruptedexception{ 2 zk.exists (root + "/start", true); 3 zk.create (root + "/" + Name, new byte[0], Ids.open_acl_unsafe, 4 createmode.ephemeral_sequential); 5 synchronized (mutex) { 6 list<string> List = Zk.getchildren (root, false); 7 if (List.size () < size) { 8 mutex.wait (); 9 } else { zk.create (root + "/start", New byte[0], ids.open_acl_unsafe,11 createmode.persistent); 12< c18/>}
When the queue is not full to wait () and then waits for watch's notification, watch's code is as follows:
1 public void Process (Watchedevent event) {2 if (Event.getpath (). Equals (Root + "/start") &&3 Event.gettype () = = Event.EventType.NodeCreated) {4 System.out.println ("Get Notified"); 5 super.process (Event); 6 doAction (); 7 } 8 }
FIFO queue with Zookeeper implementation ideas are as follows:
The idea of implementation is also very simple, that is, in a specific directory to create a sequential type subdirectory/queue_i, so that all members can be added to the queue is numbered, out of the queue through the GetChildren () method can return all the elements of the current queue, Then consume the smallest one, so that the FIFO can be guaranteed.
Below is a sample code for the form of a queue of producers and consumers, the complete code can be seen in the attachment:
Listing 6. Producer Code
1 Boolean produce (int i) throws Keeperexception, interruptedexception{2 bytebuffer b = bytebuffer.allocate (4); 3
byte[] value; 4 B.putint (i); 5 value = B.array (); 6 zk.create (root + "/element", Value, ZooDefs.Ids.OPEN_ACL_UNSAFE, 7 createmode.persistent_sequential); 8 return true; 9 }
Listing 7. Consumer Code
1 int consume () throws Keeperexception, interruptedexception{2 int retvalue = 1; 3 stat stat = NULL; 4 while (true) {5 synchronized (mutex) {6 list<string> List = Zk.getchildren (Root, true); 7 if (list.size () = = 0) {8 mutex.wait (); 9} else {ten integer min = new Integer (list.get (0). SUBSTRING (7)); 11 for (String s:list) {tempvalue = new Integer (s.substring (7)); 13 if (Tempvalue < min) min = tempvalue; Byte[] B = zk.getdata (root + "/element" + min,false, stat); Zk.delete (root + "/element" + min, 0); Bytebuffer buffer = Bytebuffer.wrap (b); RetValue = Buffer.getint (); return retvalue; 20} 21} 22 } 23}
Summarize
Zookeeper, a sub-project in the Hadoop project, is an essential module for Hadoop cluster Management, which is used primarily to control the data in the cluster, such as its management of NameNode in Hadoop clusters, and the Master election in Hbase, S State synchronization between Erver and so on.
This article describes the basics of Zookeeper and describes several typical application scenarios. These are the basic functions of Zookeeper, the most important is that Zoopkeeper provides a good set of distributed cluster management mechanism, it is the hierarchical type of directory tree data structure, and the tree nodes in the effective management, so that can design a variety of distributed data management model, Rather than just a few of the common scenarios mentioned above.
Zookeeper Application Scenario-java