Zookeeper Notification update reliable? Read the source code to find the answer!

Source: Internet
Author: User
Tags rewind zookeeper zookeeper client

Guide:

Encounter Keepper notification Update can not receive the problem, think about the reliability of the node change notification, through reading the source code to understand the registration of ZK watch and trigger mechanism, local debugging run simulated ZK update of the unreliable scene and to arrive at the corresponding solution.

The process is very tortuous, but the root cause of the problem is also the bottom, this article finally stated that the update can not receive the root cause, I hope to be helpful to other people. -----------------------------------------

Usually zookeeper is used as a configuration store, distributed lock and other functions, configuration read if every time is to go to zookeeper server read efficiency is very low, fortunately zookeeper provide node update notification mechanism, only need to set the node watch monitoring, Any updates to the node are sent to the client side in a notification manner.

As shown: The application client typically connects to a zkserver,forpath that not only reads the data from the ZK node Zknode (usually the stored data is stored in the application memory, in the example value), and a watch is set, When the Zknode node has any updates, Zkserver will send notify,client to run watch to get out of the corresponding event accordingly. This assumes that the action is to update the client local data. Such a model allows the configuration to be asynchronously updated to the client without the need for the client to read remotely each time, greatly improving the performance of the read (Re-regist re-registration is because the monitoring of the node is a one-time, after each notification, you need to re-register). But is this notify reliable? If the notification fails, wouldn't it be a local, non-updated value that the client will always read?

Because the current network environment to locate such problems is more difficult, so local download source and simulation run Zkserver & Zkclient to see the delivery of the notification.

1, git download source https://github.com/apache/zookeeper

2, CD-to-Path, run ant Eclipse Load project dependencies.

3, import the idea.

Https://stackoverflow.com/questions/43964547/how-to-import-zookeeper-source-code-to-idea

Review the relevant questions and procedures.

Run Zkserver first. Quorumpeermain is the startup class for the server. This can be based on the bin under the zkserver.sh to find the entrance. Note the startup parameters configuration parameter file, which specifies parameters such as the startup port.

Before you do this, you need to set the associated breakpoint.

First, we'll look at how the server handles after the client sets up the listener.
Zkclient is used to communicate with Zkserver using NIO, and two threads are used in the Zookeeper threading model:

Sendthread a specially formed request is sent, the request will be encapsulated as packet (containing information such as node name, watch description, etc.) sent to sever.

Eventthread is specialized in handling the event resolved after the Sendthread is received.

Zkclient has two main processor, one is sycprocessor responsible for the data synchronization between cluster (including cluster leader pick). The other is called Finalruestprocessor, which specializes in handling requests (Packet) that are handled by the docking.

The Zookeeperserver Processpacket method specifically handles the requests received. public void Processpacket (Servercnxn cnxn, Bytebuffer incomingbuffer) throws IOException {//We had the request, now    Process and setup for next InputStream Bais = new Bytebufferinputstream (incomingbuffer);    Binaryinputarchive bia = binaryinputarchive.getarchive (Bais);    Requestheader h = new Requestheader ();    H.deserialize (BIA, "header"); Through the magic of byte buffers, TXN won't is//pointing//To the start of the TXN incomingbuffer = Inc    Omingbuffer.slice ();        Authentication request Processing if (h.gettype () = = Opcode.auth) {log.info ("Got auth Packet" + cnxn.getremotesocketaddress ());        Authpacket authpacket = new Authpacket ();        Bytebufferinputstream.bytebuffer2record (Incomingbuffer, Authpacket);        String scheme = Authpacket.getscheme ();        Serverauthenticationprovider ap = providerregistry.getserverprovider (scheme);        Code Authreturn = KeeperException.Code.AUTHFAILED; if (AP! = null{try {Authreturn = ap.handleauthentication (New Serverauthenticationprovider.serverobjs (This,            CNXN), Authpacket.getauth ()); } catch (RuntimeException e) {Log.warn ("Caught runtime exception from Authenticationprovider:" + scheme +                "Due to" + E);            Authreturn = KeeperException.Code.AUTHFAILED; }} if (Authreturn = = KeeperException.Code.OK) {if (log.isdebugenabled ()) {LOG.D            Ebug ("Authentication succeeded for scheme:" + scheme);            } log.info ("auth Success" + cnxn.getremotesocketaddress ());            Replyheader RH = new Replyheader (H.getxid (), 0, KeeperException.Code.OK.intValue ());        Cnxn.sendresponse (RH, NULL, NULL);                        } else {if (AP = = null) {Log.warn ("No Authentication provider for scheme:" + Scheme + "has" + Providerregistry.listprovIders ());            } else {Log.warn ("Authentication failed for scheme:" + scheme);            }//Send a response ...            Replyheader RH = new Replyheader (H.getxid (), 0, KeeperException.Code.AUTHFAILED.intValue ());            Cnxn.sendresponse (RH, NULL, NULL);            ... and close connection Cnxn.sendbuffer (Servercnxnfactory.closeconn);        Cnxn.disablerecv ();    } return;            } else {if (h.gettype () = = OPCODE.SASL) {Record RSP = PROCESSSASL (INCOMINGBUFFER,CNXN);            Replyheader RH = new Replyheader (H.getxid (), 0, KeeperException.Code.OK.intValue ()); Cnxn.sendresponse (RH,RSP, "response"); Not sure about 3rd Arg.            What is it?        Return } else {Request Si = new Request (CNXN, Cnxn.getsessionid (), H.getxid (), H.gettype (), incom            Ingbuffer, Cnxn.getauthinfo ());            Si.setowner (servercnxn.me); AlwAys treat packet from the client as a possible//local request.            Setlocalsessionflag (SI);        Handed over to Finalrequestprocessor processing submitrequest (SI); }} cnxn.incroutstandingrequests (h);}

Finalrequestprocessor the request is resolved and the client connection succeeds, the exist command sent will fall into this part of the processing logic.


Zkdatabase is built from zkserver data persisted from disk, and you can see that this is where you add listening watch.

Then we need to understand how the watch is triggered when the server receives the node update event.
First of all, to understand the two concepts, finalrequestprocessor processing requests are divided into two types, one is transactional, a non-transactional, exist Event-type is a non-object type of operation, the above code is the processing of its logic, the operation of Things, For example, SetData operations. is handled in the following code.

Private Processtxnresult PROCESSTXN (Request request, Txnheader HDR, Record txn) {Pr    Ocesstxnresult RC; int OpCode = Request! = null?    Request.type:hdr.getType (); Long sessionId = Request! = null?    Request.sessionId:hdr.getClientId (); if (HDR! = null) {//hdr is described for a thing header, such as SetData operation will be zkdatabase take over operation,//As is the data storage model of ZK modified rc = Getzkdatabase ()    . PROCESSTXN (HDR, TXN);    } else {rc = new processtxnresult ();            } if (OpCode = = opcode.createsession) {if (HDR! = null && txn instanceof createsessiontxn) {            CREATESESSIONTXN CST = (CREATESESSIONTXN) txn;        Sessiontracker.addglobalsession (SessionId, Cst.gettimeout ());            } else if (request! = null && request.islocalsession ()) {request.request.rewind ();            int timeout = request.request.getInt ();            Request.request.rewind ();        Sessiontracker.addsession (Request.sessionid, timeout); } else {log.warn ("*****>>>>> Got" + txn.getclass () + "" + tx        N.tostring ());    }} else if (OpCode = = opcode.closesession) {sessiontracker.removesession (sessionId);    } return RC;} ! [] (http://i2.51cto.com/images/blog/201810/14/56694bbcf7b1af10f320f0d8153be8d4.png?x-oss-process=image/    watermark,size_16,text_qduxq1rp5y2a5a6i,color_ffffff,t_100,g_se,x_10,y_10,shadow_90,type_zmfuz3pozw5nagvpdgk=) If you set a breakpoint here, you can intercept the update operation on the node.

These two settings break point, you can understand the watch set up process.

Next look at how to start the Zookeeper client. The Zookeepermain is the client's entrance, which can also be found in bin/zkcli.sh. Note Set the parameter to set the server's connection address.


Modify the Zookeepermain method to set the watch listener for the node.

public ZooKeeperMain(String args[]) throws IOException, InterruptedException, KeeperException {    cl.parseOptions(args);    System.out.println("Connecting to " + cl.getOption("server"));    connectToZK(cl.getOption("server"));    while (true) {        // 模拟注册对/zookeeper节点的watch监听        zk.exists("/zookeeper", true);        System.out.println("wait");    }}

Start the client.

Because we want to observe the process of node change, above this client set up the monitoring of the node, then we need another cleint to the node to make changes, this we just need to do on the command.

At this point, the zkclient of the command line updates the/zookeeper node, and the server stops at the processing code snippet for the SetData event.

public Stat setData(String path, byte data[], int version, long zxid,        long time) throws KeeperException.NoNodeException {    Stat s = new Stat();    DataNode n = nodes.get(path);    if (n == null) {        throw new KeeperException.NoNodeException();    }    byte lastdata[] = null;    synchronized (n) {        lastdata = n.data;        n.data = data;        n.stat.setMtime(time);        n.stat.setMzxid(zxid);        n.stat.setVersion(version);        n.copyStat(s);    }    // now update if the path is in a quota subtree.    String lastPrefix = getMaxPrefixWithQuota(path);    if(lastPrefix != null) {      this.updateBytes(lastPrefix, (data == null ? 0 : data.length)          - (lastdata == null ? 0 : lastdata.length));    }    //触发watch监听    dataWatches.triggerWatch(path, EventType.NodeDataChanged);    return s;}

At this point, the classes we focus on appear. Watchmanager

Package org.apache.zookeeper.server;

Import Java.io.PrintWriter;
Import Java.util.HashMap;
Import Java.util.HashSet;
Import Java.util.LinkedHashMap;
Import Java.util.Map;
Import Java.util.Map.Entry;
Import Java.util.Set;

Import org.apache.zookeeper.WatchedEvent;
Import Org.apache.zookeeper.Watcher;
Import Org.apache.zookeeper.Watcher.Event.EventType;
Import Org.apache.zookeeper.Watcher.Event.KeeperState;
Import Org.slf4j.Logger;
Import Org.slf4j.LoggerFactory;

/**

  • This class manages watches. It allows watches to is associated with a string
  • and removes watchers and their watches in addition to managing triggers.
    */
    class Watchmanager {
    private static final Logger LOG = Loggerfactory.getlogger (watchmanager.class);
    Stores the relationship of Path to watch
    Private final map<string, set<watcher>> watchtable =
    New hashmap<string, Set <Watcher>> ();
    //Storage watch listens on which path node
    Private final Map<watcher, set<string>> watch2paths =
    New hashmap< Watcher, set<string>> ();

    synchronized int size () {
    int result = 0;
    For (set<watcher> watches:watchTable.values ()) {
    Result + = Watches.size ();
    }
    return result;
    }
    Add Listener
    synchronized void Addwatch (String path, watcher watcher) {
    set<watcher> list = watchtable.get (path);
    if (list = = null) {
    Don ' t waste memory if there is few watches on a node
    Rehash when the 4th entry is added, doubling size thereafter
    Seems like a good compromise
    List = new hashset<watcher> (4);
    Watchtable.put (path, list);
    }
    List.add (watcher);

    Set<String> paths = watch2Paths.get(watcher);if (paths == null) {    // cnxns typically have many watches, so use default cap here    paths = new HashSet<String>();    watch2Paths.put(watcher, paths);}paths.add(path);

    }
    //Remove
    synchronized void Removewatcher (Watcher watcher) {
    set<string> paths = Watch2paths.remove ( Watcher);
    if (paths = = null) {
    return;
    }
    for (String p:paths) {
    Set<watcher> list = Watchtable.get (p);
    if (list = null) {
    List.remove (watcher);
    if (list.size () = = 0) {
    Watchtable.remove (p);
    }
    }
    }
    }

    Set<watcher> triggerwatch (String path, EventType type) {
    return Triggerwatch (path, type, null);
    }
    //Trigger watch
    set<watcher> triggerwatch (String path, EventType type, set<watcher> supress) {
    Watchedevent e = new Watchedevent (type,
    keeperstate.syncconnected, path);
    Set<watcher> watchers;
    Synchronized (this) {
    watchers = watchtable.remove (path);
    if (watchers = = NULL | | watchers.isempty ()) {
    if (log.istraceenabled ()) {
    Zootrace.logtracemessage (LOG,
    Zootrace.event_delivery_trace_mask,
    "No watchers for" + path);
    }
    return null;
    }
    for (Watcher w:watchers) {
    Set<string> paths = Watch2paths.get (w);
    if (paths! = null) {
    Paths.remove (path);
    }
    }
    }
    for (Watcher w:watchers) {
    if (supress! = null && supress.contains (W)) {
    continue;
    }
    //Notifications sent
    W.process (e);
    }
    return watchers;
    }
    }
    focus on the method of Triggerwatch, you can find that watch is removed, that is, the client information stored in watch to send notifications.

    @Override
    public void process (Watchedevent event) {
    Replyheader h = new Replyheader ( -1, -1l, 0);
    if (log.istraceenabled ()) {
    Zootrace.logtracemessage (LOG, Zootrace.event_delivery_trace_mask,
    "Deliver event" + Event + "to 0x"

      • Long.tohexstring (This.sessionid)
      • "Through" + this);
        }

        Convert watchedevent to a type so can be sent over the wire
        Watcherevent e = Event.getwrapper ();

        Sendresponse (H, E, "notification");
        }
        There is no confirmation mechanism, and you will not write watch back because of a failed send.

Conclusion:
Here, you can know that watch notification mechanism is unreliable, zkserver will not guarantee the reliable arrival of the notification. Although the zkclient and Zkserver end will have a heartbeat mechanism to keep the link, but if the notification process disconnects, immediately after the connection is re-established, watch status is not restored.

It is now known that the notification is unreliable, there will be a loss of the situation, the use of zkclient need to be revised.

Local storage is no longer a static state of waiting for watch updates, but instead introduces a caching mechanism that periodically pulls and registers watch from ZK (Zkserver will do the deduplication, and the same time type of watch for the same node is not duplicated).

Another way is for the client to receive a disconnect notification and re-register all watch on the node of interest. However, the author encountered the current network situation is the client did not receive the update notification, but also did not see the connection disconnect error message. This piece still needs further confirmation. The level is limited, welcome to correct:D

New developments have been made in the StackOverflow questions:

https://stackoverflow.com/questions/49328151/ Is-zookeeper-node-change-notification-reliable-under-situations-of-connection-lo

The original official document has explained that when the connection is broken, the client to watch some recovery to do, PS: The original above I mentioned the clients of the strategy has been officially implemented ...

The client will live through the heartbeat and if it finds that the connection has been disconnected, it will reestablish the connection and send the watch and node Zxid set before the node and the server will trigger the notification if the ZXID and the service side are small to indicate a change during the disconnection.

In this sense, the zookeeper notification mechanism is at least reliable in the official documentation, at least with a mechanism to ensure it. PS: except exist watch. But the problem I encountered is still not solved. Regret did not keep the scene, dig deeper. Plan to change the implementation back to the original, follow-up further verification. Find the reason to update here again.

Final CONCLUSION Update!

With an in-depth reading of Apache's ZK forum and source code, there is an important message.

The above mentioned disconnection is divided into recoverble and unrecoverble two scenarios, the difference is mainly based on the duration of the session, all client operations including watch are associated with the session, When the session has successfully established the connection within the timeout expiration time, watch will be reset after the connection is established. However, after the session timeout has not been successfully re-established the connection, then the session is in expire state. The following connection describes the process

How should I handle session_expired?

In this case, the zookeeperclient will reconnect, but the session will be a whole new one. At the same time, the previous state is not saved.
private void Conlosspacket (Packet p) {
if (P.replyheader = = null) {
Return
}
Switch (state) {
Case auth_failed:
P.replyheader.seterr (KeeperException.Code.AUTHFAILED.intValue ());
Break
Case CLOSED:
Session is closed and returned directly.
P.replyheader.seterr (KeeperException.Code.SESSIONEXPIRED.intValue ());
Break
Default
P.replyheader.seterr (KeeperException.Code.CONNECTIONLOSS.intValue ());
}
If the session does not expire, the session status (watches) will be re-registered.
Finishpacket (P);
}
1. What is the zookeeper session expiration?

In general, we use zookeeper as a clustered form, such as a client and a zookeeper cluster (3 instances) to establish a conversation session.

In this session, the client is in fact a random link with one of the ZK provider, and the heartbeat heartbeat each other. The ZK cluster manages this session and maintains the session information on all provider, including the temporary data and the monitoring point watcher defined in the session.

If the network is not good or the ZK cluster in the case of a provider, there may be connection loss situation, such as client and ZK Provider1 connection disconnects, the client does not need any action (zookeeper API is ready for us), just wait for the client to reconnect with the other provider. This process can result in two results:

1) Successful connection within session timeout

This time the client successfully switches to connect to another provider for example, Provider2, because ZK synchronizes session-related data on all provider, it can be considered a seamless migration at this time.

2) No reconnection within session timeout

This is the case of session expire, when the Zookeeper cluster task session has ended and clears all data related to this session, including temporary nodes and registered monitoring points watcher.

After the session timeout, if the client reconnected to the zookeeper cluster, unfortunately, zookeeper will issue a session expired exception, and will not rebuild the session, that is, the temporary data and watcher will not be rebuilt.

Our implementation of the Zookeeperprocessor is based on the Apache curator client package.

Apache curator error handling mechanism

Its handling of session expire is to provide the processing of the listener registration Connectionstatelistner, when encountering session expire, the implementation of the user to do the logic. (ex: Reset watch) Unfortunately, we don't handle this event, so the connection is consistent and disconnected, but! Our app will still read the old data!

Here, we made another mistake, locally caching the Zookeeper node data. In fact, Zookeeperclient has done a local caching mechanism, but we have added a layer (note: Here is also a reason, because the ZK node data when the binary array, the business to use is usually to deserialize, our cache here is to reduce the cost of deserialization!) ), officially due to our local cache, so even if ZK is disconnected, the old value is still read!

At this point, the mystery has all been solved, it seems that the previous implementation has many posture is wrong, resulting in a variety of strange bugs. Now the solution is to listen to the reconnect notification, when the notification is received, actively let the local cache invalidation (there is still cache, because the cost of reducing deserialization, the zkclient cache just cached the binary, each time it will still need to deserialize). Code:

   Curatorframework.getconnectionstatelistenable (). AddListener (New Connectionstatelistener () {@Override Pub LIC void statechanged (curatorframework client, ConnectionState newstate) {switch (newstate) {C                ASE Connected:break;                    Case RECONNECTED:LOG.error ("Zookeeper connection reconnected");                    SYSTEM.OUT.PRINTLN ("Zookeeper connection reconnected"); Originally used Invalidateall, but this will make the cache all cached values at the same time//if the focus on the node is more, resulting in simultaneous request ZK read value, the service may be instantaneous blocking in this step//so that With Guava cache Refresh method, asynchronous update, update process,//old value returned, know update complete for (String Key:classInfoMap.keyS                    ET ()) {Zkdatacache.refresh (key);                } break; Case LOST://Session timeout, disconnect, do not do anything here, cache keep using Log.error ("Zookeeper Connection LOST")                    ; System.out.println ("Zookeeper connection Lost");                Break                Case Suspended:break;            Default:break; }        }    });

Does the

Zookeeper notification update reliable? Read the source code to find the answer!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.