As a distributed service framework, zookeeper is mainly used to solve the problem of consistency of application system in distributed cluster, it can provide data storage based on directory node tree like file system, zookeeper function is mainly used to maintain and monitor the state change of stored data, Data-based cluster management is achieved by monitoring changes in the state of these data. 1 Zookeeper basic framework
The main roles of the zookeeper cluster are Leader,learner (Follower,observer (when the server is increased to a certain extent and the throughput is lowered due to the increased voting pressure, the Observer is increased.) ) and client:
Leader: Leader, responsible for voting initiation and resolution, and updating system status
Follower: Accepts the client's request and returns the result to the client and participates in the voting
Observer: Accept the request from the client, transfer the written request to leader, and do not participate in the voting. The purpose of observer is to extend the system to improve the speed of reading.
Client: Clients, want to zookeeper to initiate a request.
The basic frame chart of zookeeper is as follows
Leader Main functions:
Recovery of data;
Maintain a heartbeat with learner, receive learner request and determine learner request message type; The main message types of learner are ping messages, request messages, ACK messages, revalidate messages, depending on the type of message, do different processing. The ping message refers to the heartbeat information of the learner, and the request message is the proposed information sent by follower, including the write request and the synchronization request, the ACK message is follower's reply to the proposal, more than half of the follower pass, then commit the proposal The revalidate message is used to extend the session valid time.
Follower Basic functions:
Send a request to leader (Ping message, request message, ACK message, revalidate message);
Receive and process leader messages;
Receive client's request, if for write request, send to leader to vote;
The only difference between the main functions of observer and follower is that observer will not participate in the leader vote.
Zookeeper Configuration Introduction:
Ticktime: The base event unit, in milliseconds. This time is the time interval between the zookeeper server or between the client and the server to maintain heartbeat, that is, each ticktime time will send a heartbeat.
DataDir: The location of a snapshot of a database in memory, as the name implies, is the directory where the data is stored zookeeper, by default, zookeeper writes the log file of the data is also saved in this directory.
ClientPort: This port is the port where the client connects to the zookeeper server, and zookeeper listens to the port to accept the client's access request.
Initlimit: This configuration entry is used to configure the maximum number of heartbeat intervals that the zookeeper accepts when the client initiates the connection, and the zookeeper server has not received the client's return information when it has been longer than 5 heartbeats (i.e. ticktime). This indicates that the client connection failed. The total length of time is 5*2000=10 seconds.
Synclimit: This configuration item identifies the message between Leader and Follower, the length of the request and response time, the longest can not exceed the length of the ticktime, the total length of time is 2*2000=4 seconds
Server. A = B:c:d:a indicates this is the number of servers, B is the IP address of this server; C means the port where the server exchanges information with the Leader server in the cluster; D means that in the event that the Leader server in the cluster hangs, a port needs to be returned for the election, and a new Leader.
Basic operations under the Zookeeper cluster:
To view the Zookeeper service roles:
Basic Command actions:
./zookeeper/server004/zookeeper/bin/zkcli.sh
2 Zookeeper Basic Introduction
Zookeeper's data structure is a tree-like structure, very similar to a standard file system. Each child node item has a unique path identifier, such as the identity of the Server1 node as/nameservice/server1.
1). Znode
Each node in the zookeeper data structure is called Znode, each znode has a unique path, Znode can have child node directories, and each znode can store data. Znode is a version of the data stored in each Znode can have multiple versions, that is, an access path can store multiple copies of data.
Znode Basic Type:
Persistent: Persistent Znode node, once created this znode point stored data will not automatically disappear, unless the client is active delete.
Persistent| Sequential: Sequentially numbered Znode nodes, which automatically add 1 based on the Znode node number currently in existence, and are not lost as the session is disconnected.
Ephemeral: Temporary znode node, client connected to ZK service will establish a session, then use this ZK connection instance to create the type of Znode, once the client closes ZK's connection, the server will clear session, The Znode node created by this session will then disappear from the namespace. The conclusion is that the life cycle of this type of Znode is the same as that of the client-established connection.
phemeral| Sequential: Temporary automatic numbering node, znode node number will automatically increase, but will disappear with the session disappeared.
Zookeeper it is only responsible for coordinating the data, the general Znode data are relatively small in KB for the unit of measurement. Zookeeper's client and server implementation classes verify that the data stored by Znode is less than 1M. If the data is large, synchronizing data between servers can take a long time and affect system performance.
2). Watcher
In zookeeper, how to get the client to be notified of a behavior (event) when znode occurs. The watcher mechanism is used in zookeeper to set observations for the "operation" of the zookeeper service, and other operations of the service can trigger observations.
The type of watcher mechanism in zookeeper:
Exists: Execute nodecreated, nodedeleted, nodedatachanged on path.
GetData Watcher: Executes nodedatachanged on path, nodedeleted.
Getchildrenwatcher: Executes the nodedeleted on the paht. or execute nodecreated on the path, nodedeleted.
Setting watcher for a node in zookeeper is one-off, and the Watcher is deleted after the Watcher trigger on Znode, so if a znode node is needed for long-term attention, the Znode must be reset after the event is triggered.
3). Basic operation
To create a node:
Stringcreate (String path,byte[] data, list<acl> Acl,createmode createmode)
Creates a given directory node path and sets data to it, Createmode identifies four types of directory nodes
To delete a node:
void Delete (String path,int version)
Delete the directory node for path, version 1 can match any version, delete all data of this directory node
Whether the query node exists:
Stat exists (String Path,boolean Watch/watcher Watcher)
Determine if a path exists and set whether to monitor this directory node
To get the node data:
Byte[] GetData (String path,boolean Watch, Stat Stat)
Get the data stored by the directory node of this path, the version of the data, etc. can be specified by stat, and you can also set whether to monitor the status of this directory node data
To set node data:
Stat SetData (String path,byte[] data, int version)
Set the data to path, you can specify the version number of this data, and if version is-1 how can match any version
To get the child nodes of a node:
List<string> GetChildren (String Path,boolean Watch)
Gets all the child directory nodes under the specified path, and the same GetChildren method also has an overloaded method that can set a specific watcher monitor child node state
Basic application of 3 zookeeper
1. Distributed lock
Basic idea:
1 first create a lock directory (Znode), usually used to describe the locked entity, called:/lock_node
2 The client who wants to acquire the lock creates znode in the lock directory as a child of the lock/lock_node, and the node type is an ordered temporary node (ephemeral_sequential);
3 The current client calls GetChildren (/lock_node) Gets the lock directory all the child nodes, does not set the watch, then obtains is smaller than own sibling node
4 gets less than its own node does not exist, indicating that the current client order number is minimal, get lock, end.
5 if present, client-side monitoring (watch) relative to its own minor ordered temporary node state
6 If the minor point state of the monitor changes, skip to step 3 and continue until you exit the lock competition.
Basic code:
try{
if (Lockname.contains (SPLITSTR)) {
Thrownew Exception ("Lockname can not contains \\u000B");
}
Mynode= zookeeper.create (lockroot + "/" + Lockname + splitstr,
Newbyte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
Createmode.ephemeral_sequential);
List<string>childrenlist = Zookeeper.getchildren (Lockroot, true);
List<string>childrenname = new arraylist<string> ();
for (String children:childrenlist) {
String node =children.split (SPLITSTR) [0];
if (Node.equals (Lockname)) {
Childrenname.add (children);
}
}
Collections.sort (Childrenname);
Query whether you are the smallest node, the smallest node can get the lock
if (mynode.equals (Lockroot + "/" + childrenname.get (0))) {
Returntrue;
}
Stringsubmynode = mynode.substring (Mynode.lastindexof ("/") + 1);
/***
* Find your Neighbor
*/
Waitnode= Childrenname.get (Collections.binarysearch (Childrenname,
Submynode)-1);
2. Distributed queues:
One is the conventional advanced first out queue;
The other is to wait until the queue member NAND after the unified sequential execution.
3. Zookeeper and HBase
Hregionserver the Ephedral way to register himself in the zookeeper, Hmaster at any time to perceive the health status of each Hregionserver
Zookeeper avoid Hmaster single point problems.
4. Tbschedule/codis: Configuration Management
Zookeeper can easily achieve the functions of cluster management, if multiple servers make up a service cluster, then a "supervisor" must know the service status of each machine in the current cluster, and once a machine fails to provide services, other clusters in the cluster must know to make adjustments to the reallocation service policy. As well as increasing the service capabilities of the cluster, one or more servers will be added, and the same must be known to the superintendent.