Datanode Data Processing Center Dataxceiver

Source: Internet
Author: User
Tags unix domain socket

Preface

Recently on the home page of CSDN saw the 10 anniversary of Hadoop article, can not help feeling that this is really a great system ah. Over the past decade, Hadoop has evolved and changed a lot, and under it, many sub-projects have been hatched, The ecosystem around Hadoop is now growing richer. So as an excellent distributed system, he has a lot of places worthy of our study, recently I in the study of dataxceiver aspects of the code, this article is a summary of the study these days.


Why choose to learn Dataxceiver?

We go from the big plane to the novel and you know how important he is. We use the Hadoop system, what is most valued, 2 words, stored, stored in the process, what is the most looking at it, That's the data, of course. And the data are all on the datanode. So it is important to master the principle of reading and writing of Datanode. And this control center is in the Dataxceiver.


definition of Dataxceiver

Dataxceiver is what to use, many people only know datanode, and do not know another very important thread dataxceiver. In Hadoop, the comments in Dataxceiver are interpreted as follows:

/** * Thread for processing incoming/outgoing data stream. */class Dataxceiver extends Receiver implements Runnable {  ...
The main idea is "thread handling input/output data flow". My personal understanding is the Data Flow processing center. The number of dataxceiver threads can reflect a node's level of busyness to a certain extent. Dataxceiver the variables and methods contained in this class are quite large, and I am not suggesting that the reader read the code in detail on a row-by-line basis. We go to learn a mechanism, the principle of the time, the main to understand is the structure. For example, we are now going to learn the Dataxceiver class, Our goal is to understand what the main things are done in this class, which objects are called upstream, which classes are called downstream, and the specifics of the code details, and so on to analyze the specific problems, or it may be inside the complex logic around the Halo, after all, this is a mature distributed program, not 1:30 will be able to understand immediately.


structure of the Dataxceiver

For us to better understand this "data center", we need to understand the overall structure of this class, before you may want to understand the internal methods:


First, this is a threading service, and the execution portal must be the Run method, and the Run method can be used to find the connections associated with these methods.

/** * Read/write Data from/to the Dataxceiverserver.    */@Override public void Run () {int opsprocessed = 0;      OP op = null;      ...//We process requests in a loops, and stay around for a short timeout.      This optimistic behaviour allows, the other end to reuse connections.      Setting keepalive Timeout to 0 disable this behavior.        do {updatecurrentthreadname ("Waiting for Operation #" + (opsprocessed + 1));            try {if (opsprocessed! = 0) {assert dnconf.socketkeepalivetimeout > 0;          Peer.setreadtimeout (dnconf.socketkeepalivetimeout);          } else {peer.setreadtimeout (dnconf.sockettimeout);        } op = Readop ();        } catch (Interruptedioexception ignored) {//Time out whilst we wait for the client RPC break;          } catch (IOException err) {///Since We optimistically expect the next op, it ' s quite normal to get EOF here. if (opsprocessed >0 && (err instanceof eofexception | | err instanceof closedchannelexception)) {if (LOG.ISD          Ebugenabled ()) {Log.debug ("Cached" + Peer + "closing after" + opsprocessed + "Ops"); Zhu}            } else {incrdatanodenetworkerrors ();          throw err;        } break;        }//Restore normal timeout if (opsprocessed! = 0) {peer.setreadtimeout (dnconf.sockettimeout);        } opstarttime = Monotonicnow ();        Processop (OP);      ++opsprocessed;      } while ((peer = null) && (!peer.isclosed () && dnconf.socketkeepalivetimeout > 0)); ...
In the middle of the run method of the main loop method, you can see 1 Readop, corresponding to the 1 processop.op corresponds to the opcode. The readop reads the opcode from the input stream:

/** Read an Op.  It also checks protocol version. *  /protected final Op Readop () throws IOException {    final short version = In.readshort ();    if (version! = datatransferprotocol.data_transfer_version) {      throw new IOException ("Version mismatch (expected:" +< C5/>datatransferprotocol.data_transfer_version  +          ", Received:" +  VERSION + ")");    }    Return Op.read (in);  }
The Processop, however, will handle the judgment:

  /** Process op by the corresponding method. *  /protected final void Processop (Op op) throws IOException {    Switch (OP) {case    Read_block:      opreadblock ();      break;    Case Write_block:      opwriteblock (in);      break;    Case Replace_block:      opreplaceblock (in);      break;    Case Copy_block:      opcopyblock (in);      break;    Case Block_checksum:      opblockchecksum (in);      break;    Case Transfer_block:      optransferblock (in);      break;    Case REQUEST_SHORT_CIRCUIT_FDS:      Oprequestshortcircuitfds (in);      break;    Case RELEASE_SHORT_CIRCUIT_FDS:      Opreleaseshortcircuitfds (in);      break;    Case REQUEST_SHORT_CIRCUIT_SHM:      oprequestshortcircuitshm (in);      break;    Default:      throw new IOException ("Unknown op" + op + "in data stream");    }  }
A total of 9 types, corresponding to 9 treatment methods. To this, the basic structure of the dataxceiver slowly clear, you can use the following picture to show:


What the sender at the top left means is explained in the back, and can be ignored first.


Dataxceiver Downstream treatment Method

The structure in the previous section has seen 9 methods for handling the corresponding Codes plus 2 response reply methods. This 9 method can be roughly divided into 2 broad categories:

1. Common Read and write block block operation method.

There is a readblock,writeblock,transferblock,copyblock,replaceblock,blockchecksum in the method of dividing the block blocks into ordinary reading and writing. The rest of the way to Shortcircuit is to belong to the Shortcircuit read-related method. The following is a detailed scenario analysis of these methods.

1.readBlock

The method name already embodies the operation of this method, it is natural to read the block information operation, generally used for remote reading or local read operation.

2.writeBlock

Writes the block block operation to write the data block passed to the parameter into the target node list.

3.transferBlock

Transfer the specified copy to the target node list, with the official comments as follows:

Transfer a replica to the Datanode targets.
4.copyBlock

Copy block information data, similar to the Readblock principle, all use the Blocksender.send method.

5.replaceBlock

Replaceblock in the dataxceiver is actually moveblock, this operation will generally be done at the time of the data balance.

6.blockChecksum

Reads the checksum data from the file meta information header.

The mechanism of shortcircuit reading in HDFs

Here to specially shortcircuit read several methods separate into a module, because shortcircuit read mechanism is HDFS in the later version of the concept is introduced, perhaps some people still do not understand, here to everyone to popularize this knowledge.


Shortcircuit's Edge .

In the early days, Hadoop was able to make data processing more efficient and kept it as local as possible to avoid a large number of remote read operations, and local read terminology was "local read". But gradually came back, although the proportion of local reading actually improved, But it doesn't seem to be optimal. Because although the data is local, but every time the client reads the data, still need to go datanode this layer, in the meantime will be able to walk the network communication of 1, can be similar to directly read the local file system in a way to read local data, And Shortcircuit Reading is born of this idea.

Shortcircuit Implementation of local read

Shortcircuit read commonly known as "short-circuit read", later adopted a Linux operating system in a count to achieve this function, "Unix Domain Socket." He is a way of communicating between processes, and it is important for him to pass file descriptors between processes. This allows for inter-process communication. More detailed articles about shortcircuit local reading can read this original how improved short-circuit local Reads bring Better performance and Security to Hadoop.

shortcircuit mechanism

The short-circuit memory segments is used in HDFs to achieve the reading of the data. The brief process by which the Dfsclient client implements local reads through Shortcircuit is as follows:

The 1.DfsClient client requests a shared memory segments from Datanode to share the fragment.

2.ShortCircuitRegistry registered objects generate and manage these memory object objects.

3. Before reading locally, the Dfsclient client requests the required file descriptor to the Datanode, which corresponds to the Requestshortcircuitfds method.

The 4.block block has a slot representation for state tracking during this period.

5. If the current reading data is completed, the corresponding release operation will be performed.

Give the official explanation of the source code:

/** * Manages client short-circuit memory segments on the DataNode.  * * Dfsclients request shared memory segments from the DataNode.  The * Shortcircuitregistry generates and manages these segments.  Each segment * Have a randomly generated 128-bit ID which uniquely identifies it.  The * segments each contain several "slots." * * before performing a short-circuit read, dfsclients must request a pair of  * File descriptors from the DataNode via the Request_short_circuit_fds * operation. As part of this operation, Dfsclients pass the ID of the the shared * memory segment they would like to use to communicate INF  Ormation about this * replica, as well as the slots number within that segment they would like to * use. Slot allocation is always do by the client. * * Slots is used to track the state of the The block on the both the client and * Datanode.  When this DataNode mlocks a block, the corresponding slots for the * replicas is marked as "anchorable". Anchorable blocks can be safely read *Without verifying the checksum.  This means, blockreaderlocal objects * Using these replicas can skip checksumming.  It also means that we can do * zero-copy reads on these replicas (the ZCR interface have no how to * verifying checksums.) * When a-DN needs to munlock a block, it needs-to-first wait for the block to * is unanchored by clients doing a no-ch Ecksum Read or a zero-copy read.  The * DN also marks the block ' s slots as "unanchorable" to prevent additional * Clients from initiating these operations The future. * * The counterpart of this class on the client is {@link Dfsclientshmmanager}. */


upstream invocation of Dataxceiver

The upstream call of the dataxceiver is actually the input side of the OP opcode, which can be found by looking for the OP,XX call location, which are all from the same object class, Sende. Examples of input op.copy_block:

@Override public  void Copyblock (Final extendedblock Blk,      final token<blocktokenidentifier> Blocktoken) Throws IOException {    Opcopyblockproto proto = Opcopyblockproto.newbuilder ()      . SetHeader ( Datatransferprotoutil.buildbaseheader (BLK, Blocktoken))      . Build ();        Send (out, op.copy_block, proto);  }
The remaining 8 methods are relative to the dataxceiver. Now it's good to explain the reason why sender exists, so you can understand it carefully. Although the sender object is the direct pass-through class of the opcode, but not the most original caller of the method, we need to look up from this point and find the first trigger. To save space, give the result directly:

Finally, the dispatcher class is used in the balancer operation. As shown, the initiator of really read and write data is what we often encounter dfsclient,dfsoutputstream,blcokreader these object classes. In that case, The upstream and downstream processing of the Dataxceiver was opened.


Dataxceiver and Dataxceiverserver

Mention Dataxceiver, you have to mention Dataxceiverserver. The Dataxceiverserver will save the record every time a newly-started Dataxceiver thread is created. In his main loop method, the Dataxceiver creation

@Override public  Void Run () {    peer peer = null;        while (Datanode.shouldrun &&!datanode.shutdownforupgrade) {      try {        peer = peerserver.accept ();        Make sure the xceiver count are not exceeded        int curxceivercount = Datanode.getxceivercount ();        if (Curxceivercount > Maxxceivercount) {          throw new IOException ("xceiver count" + curxceivercount              + "exceeds t He limit of concurrent xcievers: "              + Maxxceivercount);        }        New Daemon (Datanode.threadgroup,            dataxceiver.create (Peer, Datanode, this))            . Start ();      } catch ( Sockettimeoutexception ignored) {
It will then join the 2 map objects of Dataxceiverserver:

  /**   * read/write data from/to the dataxceiverserver.   *  /@Override public  void Run () {    int opsprocessed = 0;    OP op = null;    try {      Dataxceiverserver.addpeer (peer, Thread.CurrentThread (), this);      ...
synchronized void Addpeer (peer peer, Thread T, Dataxceiver Xceiver)      throws IOException {    if (closed) {      thr ow new IOException ("Server closed.");    Peers.put (peer, t);    Peersxceiver.put (peer, xceiver);  }
So the diagram of Dataxceiver and dataxceiverserver can be represented by the following relational structure:


Supplement

Add a little extra, recently read the source code of Dataxceiver, found that there is a lot of confusion, abnormal log level output is not accurate, is the info level, not conducive to the detection of abnormal log records, so to the community submitted issue, HDFS-9727.


RELATED LINKS

Issue Link: https://issues.apache.org/jira/browse/HDFS-9727

Github Patch Link: https://github.com/linyiqun/open-source-patch/tree/master/hdfs/HDFS-9727


Datanode Data Processing Center Dataxceiver

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.