Server Portal
The startup code for the server is in the zkserver.sh file.
The zkserver.sh script is similar to the startup script in/etc/init.d/, and is executed through the Shell's case command parsing instructions. The specific directives are as follows:
1. Start: Org.apache.zookeeper.server.quorum.QuorumPeerMain via Nohup background
2. Start-foreground: Front running Org.apache.zookeeper.server.quorum.QuorumPeerMain
3. Stop: Kill the process started by start
4. Restart: Call stop and start successively, play the role of restart
5. Status: View run through Org.apache.zookeeper.client.FourLetterWordMain
6. Upgrade: Online update via Org.apache.zookeeper.server.upgrade.UpgradeMain
7. Print-cmd: Output Start command start logic
As seen from zkserver.sh, the ZooKeeper server's entry class is Quorumpeermain.
In the ingress function, depending on the number of servers configured in the Zoo.cfg file, you decide to start the standalone (standalone) mode or the cluster (cluster) mode
If the server is not configured in the Zoo.cfg file, it is started by default as Standalone mode and the startup parameters are passed to Zookeeperservermain::main, otherwise it is started as Cluster mode.
In this section, cluster mode is not considered for the time being, only the Server run logic in Standalone mode is concerned. Standalone mode start-up process
As stated above, in standalone mode, Quorumpeermain will pass the start parameter to Zookeeperservermain::main.
Zookeeperservermain::main, zookeeper calls Runfromconfig to initialize the server after parsing the config file.
public void Runfromconfig (Serverconfig config) throws IOException {
final zookeeperserver zkserver = new Zookeeperser Ver ();
Registers shutdown handler which would be used to know the
//server error or shutdown state changes.
Final Countdownlatch Shutdownlatch = new Countdownlatch (1);
Zkserver.registerservershutdownhandler (New Zookeeperservershutdownhandler (Shutdownlatch));
Cnxnfactory = Servercnxnfactory.createfactory ();
Cnxnfactory.configure (Config.getclientportaddress (),
Config.getmaxclientcnxns ());
Cnxnfactory.startup (zkserver);
Shutdownlatch.await ();
Shutdown ();
Cnxnfactory.join ();
if (Zkserver.canshutdown ()) {
zkserver.shutdown ();
}
}
The daytime source discovery, the Cnxnfactory.startup method starts three threads, respectively is Nioservercnxnfactory (runnable starts), Preprequestprocessor, Syncrequestprocessor.
After the thread starts, it enters the waiting state of shutdownlatch.await (), blocks the main thread, and avoids the program exiting.
Exit logic can be seen in Zookeeperservershutdownhandler::handle:
if (state = = State.error | | state = = state.shutdown) {
shutdownlatch.countdown ();
}
When Zookeeperserver is in an abnormal or closed state, Shutdownlatch.countdown (), after which the shutdownlatch.await () instruction is completed, the main thread enters the shutdown process. Server Sokect
In the study notes (1), we see client side in Sendthread with the server to maintain a long socket link, corresponding to, on the server side will also have a serversocket responsible for receiving the client sent over the request.
String servercnxnfactoryname = System.getproperty (zookeeper_server_cnxn_factory);
if (Servercnxnfactoryname = = null) {
servercnxnfactoryname = NIOServerCnxnFactory.class.getName ();
}
A Servercnxnfactory object is constructed in Runfromconfig, which defaults to a nioservercnxnfactory, corresponding to the Clientcnxnsocketnio class in the client side.
@Override public
void Configure (inetsocketaddress addr, int maxcc) throws IOException {
thread = new Zookeeperthr EAD (This, "nioservercxn.factory:" + addr);
}
@Override public
void Startup (Zookeeperserver zks) throws IOException,
interruptedexception {
start ();
Setzookeeperserver (ZKS);
Zks.startdata ();
Zks.startup ();
}
The Nioservercnxnfactory class itself inherits the Runnable interface, starting a daemon thread in Nioservercnxnfactory::startup to respond to request information from the client in response to the socket request
There is a sendthread thread in the client side that is specifically responsible for the socket link with the Server. Similarly, there is a separate thread in the server-side Nioservercnxnfactory class that is specifically responsible for reading the data sent by the client.
As shown in the figure, after the client side successfully establishes the link with the server side, the client's user request is Clientcnxnsocketnio written to the socket, and when the nioservercnxnfactory is read and processed, Then through the socket to write back, get response.
For most data requests, will be in Doio gradually resolved into a packet object, and then get request requests, sent to zookeeperserver::submitrequest for consumption, the specific consumption path will be explained later, here is only a brief introduction The communication logic of the socket. the realization of watcher
SERVERCNXN implements the Watcher interface, if it is determined that the request contains watcher, the SERVERCNXN will be added to the listener list, when the specified node changes, callback Servercnxn corresponding method, Sendresponse notifies client node that information changes zookeeper data structure
Zookeeper is a distributed coordination framework based on the node model, which uses nodes like file paths for data storage.
During the run, the node information is all loaded into memory, and each node is constructed as a Datanode object, called Znode. Three layer data cache layer
Znode nodes are frequently changed due to the user's read and write operations, in order to improve the efficiency of data access, there is a three layer of data buffer layer in zookeeper to hold the node data.
outstandingchanges
Outstandingchanges is located in Zookeeperserver to hold node information that has just been changed and not synchronized to Zkdatabase zkdatabase
Zkdatabase is used to manage node data in the zookeeper.
There is a Datatree object in Zkdatabase that maintains a concurrenthashmap called nodes in Datatree to hold the full node information in memory.
During Cnxnfactory.startup, the system will restore the serialized node information to memory via Zkdb.loaddatabase () to Disk files
Disk file consists of two parts, one is Filesnap and the other is Filetxnlog. As the name implies, Filesnap is used to store ZooKeeper node information snapshots based on a point-in-time state, Filetxnlog for specific changes to node information. data persistence for zkdatabase
ZooKeeper the coordination of distributed applications by maintaining the consistency of node information. about Snapshot and Transaction
Similar to Hadoop, there are also concepts of snapshot and transaction in zookeeper.
Snapshot corresponds to the complete state of a point-in-time data, and Transaction represents a corrective instruction to the data.
When Snapshot A executes the instruction, his data status is updated to become Snapshot B.
When the server exception exits or restarts, restore the data node to the specified state there are two scenarios, one is to execute each transaction again, the other is to restore the data node to a correct snapshot, and then perform each transaction after this snapshot. The first scenario is to save each instruction starting from the first boot, while the run time increases linearly with the number of instruction bars, affecting the efficiency of the restore. Therefore, we usually use the second scenario snapshot+transaction for data restoration. Data loading Process
If zookeeper is not being started for the first time, we need to restore the zookeeper data before it is closed.
According to the previous section, we know the relationship between snapshot and transaction, in return to the source code, we see in Zookeeperserver::loaddata () will call the following codes
Public long Loaddatabase () throws IOException {Playbacklistener listener=new Playbacklistener () {public void ontxnl
oaded (Txnheader Hdr,record txn) {Request R = new Request (null, 0, Hdr.getcxid (), Hdr.gettype (), NULL, NULL);
Addcommittedproposal (R);
}
};
Long Zxid = Snaplog.restore (Datatree,sessionswithtimeouts,listener);
return ZXID; } public long Restore (Datatree DT, Map<long, integer> sessions, Playbacklistener listener) throws Ioex
ception {snaplog.deserialize (dt, sessions);
Filetxnlog Txnlog = new Filetxnlog (DATADIR);
Txniterator ITR = Txnlog.read (dt.lastprocessedzxid+1);
Long Highestzxid = Dt.lastprocessedzxid;
Txnheader HDR;
while (true) {HDR = Itr.getheader ();
if (HDR = = null) {return DT.LASTPROCESSEDZXID;
} processtransaction (Hdr,dt,sessions, Itr.gettxn ());
Listener.ontxnloaded (HDR, ITR.GETTXN ());
if (!itr.next ()) break;
}} finally {if (ITR! = null) { Itr.close ();
}} return Highestzxid; }
Snaplog is a Filetxnsnaplog class that consists of a filesnap and a filetxnlog.
Filesnap is a tool class for snapshot files and has a Serialize,deserialize method to serialize and deserialize Datatree objects.
Filetxnlog is the transaction log tool class, through Txnlog.read, we get snapshot file after the transaction log, through the processtransaction to apply the transaction to the Datatree , but also the original state.
Through Loaddatabase () we successfully loaded the node information saved in the disk file into memory, from this time we can consume the incoming socket. Processing Session Requests
In the section that responds to the socket request, we see that a daemon thread is started in nioservercnxnfactory, and the socket request information is obtained in the while loop and then distributed to Doio execution.
Zookeeper will each socket link, is considered a session, and has a time-out. The request is packaged as a Nioservercnxn object, and when the session is determined to connect to Zookeeperserver for the first time, the Connect message is read First, Maintain the currently surviving session queue in Sessiontrackerimpl.
Sessiontrackerimpl is a standalone thread that is designed to detect the survival of a session.
Other non-first-connected socket information is consumed through readrequest. requestprocessor Task Chain
Three Requestprocessor objects were created in Zookeeperserver::setuprequestprocessors, respectively, Finalrequestprocessor, SyncRequestProcessor And Preprequestprocessor, where the Preprequestprocessor and Syncrequestprocessor classes inherit from the thread class, respectively, and run as separate threads.
Readrequest extracts the request information by deserializing the packet class, and then calls Zookeeperserver::submitrequest for data processing.
public void Submitrequest (Request si) {
firstprocessor.processrequest (SI);
}
For the zookeeperserver of standalone mode, his firstprcessor is Preprequestprocessor class Preprequestprocessor
Preprequestprocessor is the starting point of the entire task chain.
Preprequestprocessor::submitrequest does not immediately process request requests, but instead joins request into the run queue submittedrequests and waits for execution.
Preprequestprocessor's own independent thread constantly pulls the request object from the queue and calls Prequest (request). In Prequest, the request is transformed into a different record object, depending on the kind of request, By Addchangerecord the Changerecord into the zookeeperserver.outstandingchanges, the node data is not synchronized to the Datatree.
According to the three-tier cache model of the node, when the node information is obtained, the information is obtained from Outstandingchangesforpath, and when the corresponding node information is not found, it is obtained by Zkdb::getnode. Syncrequestprocessor
Syncrequestprocessor, as a downstream consumer of Preprequestprocessor, is responsible for writing transaction to Txnlog and periodically building snapshot files.
The request is written to the transaction Log File in Zkdatabase.append, and if the current number of TXN is found to exceed the threshold, a snapshot thread is started and Datatree is taken as the snapshot instance to disk.
In Takesnapshot, the snapshot is saved to disk by serializing the current datatree structure
When the number of syncrequestprocessor processing bars exceeds the threshold of 1000, the flush () command is called to pass the task one-by-one to the downstream requestprocessor for processing. Finalrequestprocessor
Finalrequestprocessor as the end of the task chain in the standalone mode, the following work is done mainly.
while (!zks.outstandingchanges.isempty () && zks.outstandingChanges.get (0). Zxid <= request.zxid) {
Changerecord cr = Zks.outstandingChanges.remove (0);
if (Cr.zxid < Request.zxid) {
Log.warn ("Zxid outstanding" + Cr.zxid + "is less than current" + Request.zxid);
}
if (Zks.outstandingChangesForPath.get (cr.path) = = cr) {
zks.outstandingChangesForPath.remove (cr.path);
}
}
if (REQUEST.HDR! = null) {
Txnheader hdr = REQUEST.HDR;
Record txn = request.txn;
rc = Zks.processtxn (HDR, TXN);
}
Call ZKS.PROCESSTXN () and merge the request information into the datatree to clean up redundant data in the zks.outstandingchanges, preventing outstandingchanges from growing indefinitely.
Task Chain Summary
ZooKeeper server implements the processing of requests through a three-tier task chain.
The first layer is responsible for building a temporary node object in Outstandingchanges, so that subsequent requests can quickly get the latest state of the corresponding node.
The second layer is responsible for converting the request data to the transaction log and logging it to disk to facilitate the restoration of the node data after the restart. Snapshots are also saved periodically based on the log operation.
The third layer is responsible for bulk merging the request data into Datatree, while clearing the first layer of temporarily constructed node objects. Summary
ZooKeeper server uses Datatree to hold all node information in memory, saving historical node data on disk through snapshot and txnfile.
When responding to a request, the Server distributes the request data to a Requestprocessor task chain for consumption.
In the task chain, the thread security and consistency of the data is ensured through a single thread.
Also because in ZooKeeper is a single-threaded guarantee of data thread safety, the efficiency of the operation under the large traffic level is worth thinking about, then you can see cluster under the ZooKeeper there is no optimization of this piece.