Objective
In HDFs, we all know that Datanode is "alive" by sending a heartbeat message to namenode on a regular basis. Of course, another function of heartbeat information sending is to send its own block report information to Namenode to ensure the update of cluster data. Then Namenode will give feedback to each datanode a reply command. From here, the heartbeat of the operation performed here is still relatively "heavy". So here will be a problem, on the one hand Datanode need to timely report their own block information to Namenode, on the other hand, Datanode also wait for namenode Heartbeat return command, in other words, if Namenode is currently processing busy state, The speed at which the heartbeat is handled is unusually slow, so it is possible to cause Datanode to delay the next heartbeat, and the worst result is to be determined to dead node. So here, we should be able to separate the heartbeat "survival" check function from the current logic to achieve a lightweight datanode life check. In the narrative of this article, we call this the Lifeline (Lifeline).
Datanode Lifeline Introduction
Datanode Lifeline, English is all called Datanode Lifeline. With the addition of HDFS code functionality, the heartbeat process becomes more and more "heavy" each time. In fact, the heartbeat process of the block escalation process and wait for the reply command these 2 processes can be separated. and Datanode Lifeline message is based on the former implementation. Use the following sentence to summarize the concept of Datanode lifeline.
Datanoded Lifeline is essentially a lightweight block of information reporting, it does not need to wait for namenode response results, and it can achieve the purpose of datanode status check.
The application scenario of Datanode Lifeline
So what are the Datanode lifelines that can help us solve the problem of abnormal situations? Here are 2 examples, so that everyone will understand the importance of it.
Scenario One: Namenode persistent busy state
In the original logic,Datanode's heartbeat reporting method is blocking, it must wait for the Namenode to complete, and give the reply command, and then datanode to perform these command operations . So there is a serious problem here, when namenode is processing a large chunk of the report information, in order to maintain the consistency of the data, the process is required to lock, the other heartbeat information will be forced to block state, waiting for the end of the current processing process. This will cause datanode the next heartbeat delay, if the datanode exceeds the heartbeat detection of the Maximum tolerance time (default 630s), it will be considered to be dead node, will eventually lead to a completely unnecessary large number of replication operations.
Scenario Two: Datanode persistent busy state
Not only Namenode will be busy, Datanode itself will also be busy. For example, the Bpserviceactor thread service that Datanode used to send heartbeat information suddenly stuck in one of the steps, causing it to not perform blockreport operations in a timely manner, and it would be considered a dead node for a long time.
In fact, from here we can see that the survival of the heartbeat check function and wait for the processing of the command is not a very reliable thing. a good way to do this is to construct a lightweight messaging interface that is placed in a heartbeat sending process that is different from the bpserviceactor, so that you can do a good job of updating the Datanode survival status. And that's what Datanode Lifeline did .
Datanode Lifeline Design
DataNode Lifeline message feature is currently unpublished and implemented in Community issue:hdfs-9239 (DataNode Lifeline Protocol:an alternative Protocol for reporting DataNode liveness). After reading its design document, the author summarizes the following key design points:
- A lightweight block report information protocol is constructed, and the format of the reported block information is exactly the same as the current heartbeat form, except that it does not need to bring back results, and does not require Namenode to lock operations.
- Datanode Lifeline Message Delivery needs to be placed in a separate thread, performed on a regular basis, avoiding the impact of the heartbeat sending the main thread, while doing an alternate function.
The main is the above 2 points, more details, you can read the article at the end of the Datanode Lifeline design document link.
Core implementation of Datanode Lifeline protocol
The following is an analysis of the Datanode lifeline function from the source code level, hoping to help you better understand this feature.
First, you need to add the interface method in PB (The return content here is empty):
//notnew////isisno command dispatch. message LifelineResponseProto { } service DatanodeLifelineProtocolService { rpc sendLifeline(hadoop.hdfs.datanode.HeartbeatRequestProto) returns(LifelineResponseProto); }
After defining the interface, then implement the corresponding server-side and client-side PB implementation methods. There is no specific introduction here. Let's take a look at how DataNode Lifeline lifeline messages are sent to Namenode.
Lifeline the message Send thread service is defined in class Bpserviceactor, the following is the definition of this class:
Private Final class lifelinesender implements Runnable, closeable { //Namenode correspondence address Private FinalInetsocketaddress lifelinennaddr;//Lifeline message Send thread PrivateThread Lifelinethread;//Lifeline Message RPC Interface call class PrivateDATANODELIFELINEPROTOCOLCLIENTSIDETRANSLATORPB Lifelinenamenode; Public Lifelinesender(Inetsocketaddress lifelinennaddr) { This. lifelinennaddr = lifelinennaddr; } ...
Then we look at the main logic of sending the lifeline message:
@Override Public void Run() {//The Lifeline RPC depends on registration with the NameNode, so wait for //Initial registration to complete.... while(Shouldrun ()) {Try{if(Lifelinenamenode = =NULL) {Lifelinenamenode = Dn.connecttolifelinenn (LIFELINENNADDR); }//If the current time is within the cycle time of sending the lifeline message, the lifeline message is sentSendlifelineifdue (); Thread.Sleep (Scheduler.getlifelinewaittime ()); }Catch(Interruptedexception e) {Thread.CurrentThread (). interrupt (); }Catch(IOException e) {Log.warn ("IOException in Lifelinesender for"+ Bpserviceactor. This, e); }} log.info ("Lifelinesender for"+ Bpserviceactor. This+"exiting."); }
Here we go into the Sendlifelineifdue method,
Private void Sendlifelineifdue()throwsIOException {//Get the current send time LongStartTime = Scheduler.monotonicnow ();//If the current send time has not reached the next time the target is sent, skip this send action if(!scheduler.islifelinedue (StartTime)) {if(Log.isdebugenabled ()) {Log.debug ("skipping sending lifeline for"+ Bpserviceactor. This+", because it is not due."); }return; }if(Dn.areheartbeatsdisabledfortests ()) {if(Log.isdebugenabled ()) {Log.debug ("skipping sending lifeline for"+ Bpserviceactor. This+", because heartbeats is disabled for tests."); }return; }//Otherwise send Lifeline message, block report informationSendlifeline ();//Metric statistics for Lifeline messagesDn.getmetrics (). Addlifeline (Scheduler.monotonicnow ()-startTime);//Set the next time the lifeline message is sentScheduler.schedulenextlifeline (Scheduler.monotonicnow ()); }
Set the next Lifeline message send time method as follows,
long scheduleNextLifeline(long baseTime) { // Numerical overflow is possible here and is okay. nextLifelineTime = baseTime + lifelineIntervalMs; return nextLifelineTime; }
Here the lifeline message is sent at a interval of 3 times times the heartbeat send interval, which is the default of 9 seconds.
See here, if you think more carefully, there will be a special problem: beware of jumping when the sending thread is blocked, the lifeline send thread can indeed replace the sending of block report information, but when 2 threads are running normally, is it not a duplicate sending of the block report information? ? This is a very special point, the author did not consider this before, the designer has done a clever design here.
After each heartbeat successful send, also carries on the next Lifeline message sends the time the setting, indicates that this period does not need to carry on the lifeline message sends, because the heartbeat already reported the block information to Namenode.
This processing logic has been added to the logic to set the next heartbeat send time:
long scheduleNextHeartbeat() { // Numerical overflow is possible here and is okay. nextHeartbeatTime = monotonicNow() + heartbeatIntervalMs; // 以下次心跳时间为起始时间点,重新设置下次Lifeline时间 scheduleNextLifeline(nextHeartbeatTime); return nextHeartbeatTime; }
for the above processing, that is, when the heartbeat is sent normally, the lifeline message thread will not send the message out .
Here we continue to look at the Namenode end of the Lifeline message processing process, in class Fsnamesystem, because the lifeline message is only used to send block report information, is a lightweight processing method, here does not need to get the operation of the lock , The code is as follows:
void handleLifeline(DatanodeRegistration nodeReg, StorageReport[] reports, longlongintint xmitsInProgress, int failedVolumes, VolumeFailureSummary volumeFailureSummary) throws IOException { int maxTransfer = blockManager.getMaxReplicationStreams() - xmitsInProgress; blockManager.getDatanodeManager().handleLifeline(nodeReg, reports, getBlockPoolId(), cacheCapacity, cacheUsed, xceiverCount, maxTransfer, failedVolumes, volumeFailureSummary); }
In the normal case of the heartbeat processing method, it is necessary to lock, the code is as follows:
Heartbeatresponse handleheartbeat (Datanoderegistration Nodereg, storagereport[] reports,LongCachecapacity,LongCacheused,intXceivercount,intXmitsinprogress,intFailedvolumes, Volumefailuresummary Volumefailuresummary,BooleanRequestfullblockreportlease)throwsIOException {//need to acquire lock operationReadlock ();Try{//get datanode Commands Final intMaxtransfer = Blockmanager.getmaxreplicationstreams ()-xmitsinprogress; datanodecommand[] Cmds = Blockmanager.getdatanodemanager (). Handleheartbeat (Nodereg, reports, Getblockpoolid (), C Achecapacity, cacheused, Xceivercount, Maxtransfer, Failedvolumes, volumefailuresummary); ...return NewHeartbeatresponse (Cmds, Hastate, Rollingupgradeinfo, Blockreportleaseid); }finally{//Operation end release lock OperationReadunlock ("Handleheartbeat"); } }
Therefore, the traditional heartbeat processing method will exist a certain lock of competition. The subsequent block report updates the logical part, the two methods basically consistent, the detailed code everybody may again HDFS-9239 on the reading study.
At last, the author thinks that this function will be more obvious when the cluster size is larger, because Namenode, Datanode will be more prone to busy state. Also note that this feature is not turned on by default, and you need to configure the RPC address in configuration item dfs.namenode.lifeline.rpc-address to start this feature.
Resources
[1]. DataNode Lifeline Protocol:an Alternative Protocol for reporting DataNode liveness
[2]. Datanode-lifeline-protocol.pdf
Datanode Lifeline Message