Zookeeper study summary of the second chapter: Zookeeper in-depth discussion

Source: Internet
Author: User

In fact, Zookeeper series of learning summary has been written, this period of time in preparation for the work of things, has not been updated. Below for everyone to send, the text if there is inappropriate place, welcome to testify, greatly appreciated!.

1. Data Model 1.1. Suitable for storing small data only

ZK maintains a logical tree hierarchy in which nodes in the tree are called Znode, and each znode has an ACL (permission control). Zookeeper is designed to coordinate services, so Znode stores small data, rather than large volumes of data, and the data capacity is typically within the 1MB range.

1.2. Atomicity of the operation

Znode data reads and writes atoms, either reads or writes the complete data, or fails, and does not appear to read or write part of the data.

1.3. Path to Znode

and the file system path format in Unix, but only the absolute path is supported, the relative path is not supported, and the dot number (".") is not supported. and ".." )。

1.4. Short-term Znode and long-lasting znode

There are two types of Znode: short-lived and long-lasting. The short Znode life cycle is limited to the connection between the client that created it and the server side, and Znode will be removed after the client disconnects.

1.5. Sequential Znode

The name contains the Znode that zookeeper the specified order number. If you set a sequential identity when creating a Znode, the Znode is created and a number is appended to the name, which is generated by a monotonically incrementing counter. For example, the path that was passed in when the node was created is "/aa/bb", and it may be "/aa/bb0002" after it is created, and "/aa/bb0003" after it is created again.

There are four modes of Znode creation createmode, namelyephemeral(short znode),ephemeral_sequential(short order Znode) , Persistent(persistent znode) and persistent_sequential(persistent sequence znode). If you have read the previous blog post, then the API call here should be very well understood, see: http://www.cnblogs.com/leocook/p/zk_0.html.

1.6. Observe

This section has been described in detail in the previous blog post, including the observation of connections and the observation of Znode, which plays an important role in the construction of a stable zookeeper application, which will be said below.

2. ACLs

That is, access control list. Znode is created with an ACL list, ZK provides the following three authentication modes :

    • Digest

User name + password authentication.

    • Host

Client host name hostname authentication.

    • Ip

IP authentication for the client.

    • Auth

Using SessionID Authentication

    • World

No authentication, default is no permissions. This mode is special, and in adding an ACL to a zk connection , it says

ACL permissions correspond to the following table:

When setting an ACL, you can set the ACL for the ZK client and server connections, or you can set the ACL for Znode when creating the Znode, and if there is a ZK client to manipulate Znode after the Znode is created, the corresponding operation can be completed only if the permission requirements are met:

2.1. Add an ACL to the ZK connection

You can use the Addauthinfo () method of the ZK object to add validation modes, such as authentication using digest mode: Zk.addauthinfo ("Digest", "USERNAME:PASSWD". GetBytes ());

When the Zookeeper object is created, initialization is added to the World validation mode. The authentication ID for the World authentication mode is "anyone".

If the connection creates a znode, then he will be added the authentication ID of auth authentication mode is "", that is, an empty string, which will be verified using SessionID.

2.2. Setting ACLs for Znode
    • Create an ACL yourself

When you create an ACL object , you can construct the ACL class by using the ACL (int perms, ID ID):

where parameter perms represents permissions , there are related constants in interface org.apache.zookeeper.ZooDefs.Perms: READ, WRITE, CREATE, DELETE , all, and admin, which values the following table:

Id The parameter is the validation mode , which can be created by constructing the method ID (string scheme, string id). Parameter scheme is the authentication mode, digest, host or Ip,id is the corresponding authentication, digest to the application username and password pair, such as "user:passwd", host corresponding hostname, such as "localhost", IP corresponding IP address, such as " 192.168.1.120 ".

    • using the API The ACL in the preset

You can set the ACL list for this znode when you create Znode. There are some permissions constants already set in interface Org.apache.zookeeper.ZooDefs.Ids, such as:

(1),open_acl_unsafe: Fully open. In fact, this is the World Authentication mode, because each ZK connection has the World authentication mode, so Znode when the open_acl_unsafe is set, is open to all connections.

(2),creator_all_acl: All permissions to create the Znode connection. In fact, this is done using the Auth authentication mode, using SessionID for verification. So when Creator_all_acl is set up, the connection that creates the Znode can make any changes to the Znode.

(3),Read_acl_unsafe: All clients are readable. In fact, this is the World Authentication mode, because each ZK connection has the World authentication mode, so Znode when the Read_acl_unsafe is set, all connections can read the Znode.

Note: The red part is I read the source of some research, auth and World related description for reference.

3. Operating mode

The zookeeper has two modes of operation: Standalone mode (standalone mode) and copy mode (replicated modes).

3.1. Standalone mode

There is only one zookeeper service instance that is not guaranteed to be highly reliable and resilient, can be used in test environments, and is not recommended for production environments.

3.2. Copy mode

Replication mode is the cluster mode, with multiple zookeeper instances running, it is recommended that multiple ZK instances be on different servers. Data is constantly synchronized between different zookeeper instances in the cluster. More than half of the instances remain normal operation, the ZK service will be able to run normally, for example: There are 5 ZK instances, hanging 2, and 3 remaining, still can work normally, if there are 6 ZK instances, 3 are hung, it will not work properly.

Each znode modification is replicated to more than half of the machines, which guarantees that at least one machine will be kept up to date, and the remaining copies will eventually be in the new state. To implement this function, zookeeper uses the Zab protocol, which has two phases that can be infinitely repeated:

    • Election leadership

All ZK instances in the cluster elect a "leader instance" (leader), and other instances are called "follower instances" (follower). If the leader fails, the remaining instances will pick up a leader and provide services together, and if the previous leader returns to normal, it becomes follower. Election follower is a fast process with no noticeable performance impact.

Leader main function is to coordinate all instances to achieve the atomicity of the write operation, that is: All writes will be forwarded to leader, and then leader will broadcast the update to all follower, when more than half of the instances are written, leader will submit this write operation, The client then receives a successful response from the write operation.

    • Atomic Broadcast

Above said: All writes will be forwarded to leader, and then leader will broadcast the update to all follower, when more than half of the instances are written, leader will submit the write operation, and then the client will receive a successful write operation response. In this case, the atomic nature of the client's write operation is achieved, and each write operation succeeds or fails. The two-phase commit protocol for logic and database is very similar.

3.3. Data consistency in replication mode

Each write operation of Znode is equivalent to a transaction commit in the database, and each write has a globally unique ID called: Zxid (ZooKeeper Transaction). Zookeeper will sort the operation according to the ZXID size of the write operation, zxid the small operation will be executed first. These features in ZK guarantee the consistency of its data:

    • Sequential consistency

Write operations for any client are committed in the order in which they are sent. If a client changes the value of a znode to a and then changes the value to B (with no other modifications), then any client will not read a after reading a value of B.

    • Atomic Nature

This point has already been said, write operations only success and failure of the two states, does not exist only write a percentage of how much so one said.

    • Single System image

The client connects only those instances of the state that are up to date in the host list. If the instance being connected to is hung and the client attempts to reconnect to another instance in the cluster, the other instances that lag behind the failed instance do not receive the connection request, and only those instances with the same or updated version of the failed instance receive the connection request.

    • Durability

After the write operation is completed, it is persisted and is not affected by server failure.

    • Timeliness of

When a znode is read, the sync method should be performed so that the ZK instance connected to the read operation can be synchronized with the leader so that the latest class capacity can be read.

Note: Sync The call is asynchronous, without waiting for the call to return, ZK the server will ensure that all subsequent operations are Sync The operation is completed before execution, even if the operation is performed Sync prior to being submitted.

4. Improve fault tolerance for zookeeper applications

Distributed environment is very complex, network unreliable, single point of failure and other problems are often occurring. These issues need to be considered carefully when building a distributed application. Therefore, how to build a resilient distributed application will be a topic worth discussing. Each exception in the Java API corresponds to a class of failure patterns, and below we will discuss some of the possible failures in the zookeeper application using exceptions in the Java API.

4.1. Some common exceptions in the Java API
    • interruptedexception Exception

If a client's operation is interrupted, a Interruptedexception exception is thrown. When the exception is thrown, it is not necessarily a failure, it can only indicate that a zookeeper operation was interrupted.

    • keeperexception Exception

The server sends an error signal or the server has a communication failure. The class now has a total of 21 subcategories, divided into 3 major categories:

(1) , state anomalies

A state exception occurs when a client fails an operation on ZK. For example, if the version number specified when updating data is incorrect, an exception is thrown badversionexception, and if a child node is created under a short znode, an exception nochildrenforephemeralsexception is thrown.

(2) , recoverable exceptions

Those exceptions that can be recovered in a ZK session are called recoverable exceptions. When the ZK connection is lost, an exception connectionlossexception is thrown, and ZK automatically attempts to reconnect to ensure the integrity of the session. ZK is unable to determine whether the connectionlossexception exception-related operation was executed successfully, it is possible to complete the only part, then whether to re-execute the operation just to know whether the operation is idempotent.

An idempotent operation is an operation in which one or more executions produce the same result; a non-idempotent operation is one or more executions that produce a different result. A non-idempotent operation cannot be manipulated blindly.

Write operations are created, deleted, modified. In a distributed environment, delete the Znode in ZK or modify the Znode data is idempotent, only the creation Znode may not be idempotent, the creation order Znode is a non-idempotent operation.

So how do you avoid creating the order Znode does not appear duplicate creation? Let me start the discussion:

Hypothetical scenario:

client: The client task is to create only one sequential Znode when connected to the ZK server;

connectionlossexception : The session was not invalidated after throwing the connectionlossexception exception, but ZK was unable to determine whether the operation to create the Znode was successful.

We know that the order Znode node name format is shaped like "znodename<sequentialnumber>", ZK client and server-side sessions have a globally unique SessionID, We can put SessionID into the name of Znode, like this: " znodename<sessionid><sequentialnumber>", The sequentialnumber is unique relative to the parent znode. This way, before we create a znode, we can make sure that the parent Znode has a sub-znode with no name, such as " znodename<sessionid>", to ensure that each client connection creates only one znode.

When will this scenario be met? One of the core ideas is here when we are going to implement a distributed lock. So the question is, what is a distributed lock? There will be independent blog to explain the code implementation.

(3) , non-recoverable exceptions

When an unrecoverable exception occurs, all ephemeral znode are lost, only the program displays the Rebuild ZK connection and rebuilds the Znode state. For example: Session expiration throws an exception sessionexpiredexception, and an authentication failure throws an exception authfailedexception.

(4) , exception capture processing

Each subclass corresponds to an exception state, and each subclass corresponds to an information code about the type of error that can be obtained by using the code method. There are two ways to handle this type of exception:

1, by detecting the error code (can call the code method old access) to determine what kind of anomaly, and then decide what remedial measures should be taken;

2, by chasing the equivalent keeperexception exception, and then each section of the capture code to perform the corresponding action.

4.2. Build a reliable Zookeeper application

The above mentioned the ZK server side may have some network failure or single point of failure, then how to write a reliable ZK client program to deal with the possible unstable ZK instance? Here we give an example of writing data to a znode to implement it:

/*** Display configuration *@throwsThe keeperexception server sends an error signal or the server has a communication failure. The class now has a total of 21 subclasses, * divided into 3 categories:<br/> * 1, State anomalies (such as: Badversionexception, nochildrenforephemeralsexception); * 2, recoverable exceptions (for example: connectionlossexception), * 3, unrecoverable exceptions (e.g. sessionexpiredexception, authfailedexception). * Each subclass corresponds to an exception state, and each subclass corresponds to an information code about the type of error, which can be obtained by means of the code method. * There are two ways to handle this kind of anomaly:<br/> * 1, through <b> detect error codes </b> to determine what remedial action;<br/> * 2, through <b> Chase the equivalent keeperexception exception </b>, and then perform the corresponding action in each capture code. * @throwsInterruptedexception zookeeper operation was interrupted. <b> is not necessarily a failure, only indicates that the corresponding operation is canceled </b>. */ Public Static voidWrite (string path, String value)throwskeeperexception, interruptedexception {intRetries = 0;  while(true) {        Try{stat stat= zk.exists (Path,false); if(Stat = =NULL{zk.create (path, value.getbytes (CHARSET), Ids.open_acl_unsafe, createmode.persistent); }Else{zk.setdata (path, value.getbytes (CHARSET),-1); }             Break; } Catch(keeperexception.sessionexpiredexception e) {//TODO Here session expires, throws an exception, called by the upper layer to recreate the Zookeeper object            Throwe; }Catch(keeperexception.authfailedexception e) {//TODO Here authentication, throw an exception, run by the upper level to terminate the program            Throwe; }Catch(keeperexception e) {//Check for the number of attempts that have been exceeded            if(Retries = =maxretries) {                Throwe; } Retries++;        TimeUnit.SECONDS.sleep (Retry_period_seconds); }    }}

If you're a Java developer, I don't think there's any explanation for the code above. Below is how the upper call is handled:

intFlag = 0; while(true) {    Try{Write (path, value);  Break; } Catch(keeperexception.sessionexpiredexception e) {//TODO: Re-create, start a new sessionE.printstacktrace (); ZK=NewZooKeeper (Hosts, Session_timeout, This); } Catch(keeperexception e) {//TODO tried multiple times, or went wrong, only exitedE.printstacktrace (); Flag= 1;  Break; }Catch(keeperexception.authfailedexception e) {//TODO here authentication, terminate program runE.printstacktrace (); Flag= 1;  Break; } Catch(IOException e) {//TODO failed to create zookeeper object, unable to connect to ZK clusterE.printstacktrace (); Flag= 1;  Break; }}system.exit (flag);

About writing a recoverable zookeeper application, this piece understands that the rest of the place should be comprehend by analogy.

The following blog will update several zookeeper development examples, such as distributed configuration system, distributed lock implementation.

Reference Address: http://zookeeper.apache.org/doc/r3.4.6/

Reference book: the Hadoop authoritative guide

Zookeeper study summary of the second chapter: Zookeeper in-depth discussion

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.