HDFS1.0 Source code parsing-datanode-end data storage and management Datastorage and Fsdataset parsing __ Storage

Source: Internet
Author: User
Tags reserved

This section mainly introduces the data storage and management in the DN, we know that logically we store the data into the HDFs file system, but the specific data on each DN is how to store it, which involves several relatively large class datastorage, Storage, Fsdataset and so on. First read the DN source code this part is not very clear, now the idea, also calculate to just start to see some tips of child shoes.

We configure this option when Hdfs-site.xml is configured

     <property>         <name>dfs.data.dir</name>
This configuration item is the location where each DN stores data, the following categories of files are stored in the directory:

Blocksbeingwritten Current Detach Storage tmp

Where Blocksbeingwritten is the block currently being written, it is transferred from Blocksbeingwritten to the current file, and current is the home directory of the data store that is currently written. Detach is a hard link, used primarily for write-time copies. TMP stores temporary information that deletes data in TMP when the DN initiates a check.

First look at Datastorage, which plays a role in DN data management.

public class Datastorage extends Storage {   
As you can see from the Datastorage class declaration that datastorage is a special class of storage for DN data management, it's easy to think of a special class in NN (yes, fsimage), The main function of the Torage class is to store some information about both public.

Let's take a look at the role that Datastorage played when the DN started.

311   void Startdatanode (Configuration conf,
 312                      abstractlist<file> Datadirs, secureresources Resources
 313                      ) throws IOException {

 343     storage = new Datastorage ();

 380       Storage.recovertransitionread (Nsinfo, Datadirs, startopt);
The above line of code is really the code that interacts primarily with datastorage in the method Startdatanode. First of all, let's look at the initialization of Datastorage,

   Datastorage () {     super (Nodetype.data_node);     Storageid = "";   }
The main job is to declare your own type of storage. Next look at the Recovertransitionread method, the core code is as follows:

The     iterator<file> it = Datadirs.iterator (); It.hasnext ();) {       DataDir File = It.next ();
108       storagedirectory sd = new Storagedirectory (datadir);//Initialize storage 109 in root
storagestate       ;
The $       try
{         curstate = sd.analyzestorage (startopt);
112         //SD is locked but not opened
113         switch (curstate) {
114 case         NORMAL:           in break;
116 Case         Non_existent:
117           //Ignore this storage
118           ("Log.info directory" + Storage + "does not exist.");
119           It.remove ();           continue;
121 Case         not_formatted://Format
122           log.info ("Storage directory" + DataDir + "are not formatted.");
123           Log.info ("Formatting ...");
124           Format (SD, nsinfo);
A break           ;
126         Default:  //Recovery part is common
127           sd.dorecover (curstate);
128         }

Preferred to see this loop is to check for each storage path configured in the Hdfs-site.xml configuration file. The specific check is done through the Storagedirectory Analyzestorage method, which returns the state of the current storage (this state is already covered in the blog above, mainly because the upgrade process may cause the DN to be in multiple states). If it is normal, it does not work; if it is non_existent the directory in the configuration item does not exist, and if it is not_formatted, the command is a formatted operation (Hadoop Dfs-format), through format ( SD, Nsinfo); method is formatted, and other states are processed through the Sd.dorecover (curstate) method.

It is easy to find storagedirectory objects in the above description, which play a key role in each part of the process. So to analyze the main role of this storagedirectory class, Storagedirectory is an internal class in the Datastorage parent class storage

   Storagedirectory public Class {
183     File              root;//root directory
184     filelock          lock; /Storage Lock
185     Storagedirtype dirtype;//Storage dir type
There are 3 main member variables, and we are focused on root, one of the storage paths initialized to which it was initialized to be configured. The following analysis of the main methods, first of all we use the Analyzestorage method, this part of the code is relatively long, it is not posted, simply introduce the logic of the next one. First, determine if the root path already exists, and return storagestate.non_existent if it does not exist and is currently executing the startupoption.format format command. In determining whether root is a directory or writable, if one does not satisfy return storagestate.non_existent. If the configured path exists, first lock the file, a common technique for locking the file is to create an empty Filelock file that will be added to this blank file to control the synchronization behavior of a directory by locking the file.

469       Boolean hasprevious = Getpreviousdir (). exists ();
470       Boolean hasprevioustmp = Getprevioustmp (). exists ();
471       Boolean hasremovedtmp = Getremovedtmp (). exists ();
472       Boolean hasfinalizedtmp = Getfinalizedtmp (). exists ();
473       Boolean hascheckpointtmp = Getlastcheckpointtmp (). exists ();
Then determine whether a variety of state-produced signs of the existence of a document, depending on the existence of different files to return to different states.
Depending on the return state, the Dorecover method in the class is also processed,

534 Public     void Dorecover (Storagestate curstate) throws IOException {
535       File CurDir = Getcurrentdir ();
536       String RootPath = Root.getcanonicalpath ();
537       Switch (curstate) {
538 case       Complete_upgrade:  //MV previous.tmp         --> Previous 539 Log.info ("Completing previous upgrade for storage directory"
540                  + RootPath + ".");
541         Rename (getprevioustmp (), Getpreviousdir ());
542 return         ;
The above is just a small part of the code, from which you can see that the main function of the function is to restore the directory to a stable state based on the current state of the directory.

At this point we have introduced the work done by datastorage in the inspection section, after inspection, through the dotransition (Getstoragedir (IDX), Nsinfo, startopt); Do some necessary conversion work, mainly based on the version file and the STARTOPT option in the current directory (honestly this part is not very careful, the details are carefully studied). Finally, the meta information is written by This.writeall ().

Through the above analysis, we basically understand the role of datastorage, check the configuration of the path is available, is in a consistent state, according to these states for processing and conversion, as far as possible to restore the current inconsistent state, so the role of datastorage is equivalent to preprocessing.

Let's start with the main flow of fsdataset, or look at the relationship between Fsdataset and DN, in the Startdatanode method,

384       
Create a Fsdataset object, and then take a look at the process of creating the object to understand the composition and functionality of the Fsdataset class.

1118//Get the path of the configured Datanode file 1119 string[] Datadirs = conf.getstrings (Datanode.data_dir_key);
1120 1121 int volsconfigured=0;
1122 1123 if (datadirs!= null) 1124 volsconfigured = Datadirs.length;

1125 1126//Compute the number of datanode directories that are currently unreadable 1127 int volsfailed = Volsconfigured-storage.getnumstoragedirs ();
1143//Each volum corresponds to a configured path 1144 fsvolume[] Volarray = new Fsvolume[storage.getnumstoragedirs ()]; 1145 for (int idx = 0; idx < storage.getnumstoragedirs (); idx++) {1146 Volarray[idx] = new Fsvolume (storage.
Getstoragedir (IDX). Getcurrentdir (), Conf);
1147} 1148//through Fsvolumset management of each volum 1149 volumes = new Fsvolumeset (Volarray);
1150//For each volum under the processing of documents, loading Volummap and SumIndex volumes.getvolumemap (VOLUMEMAP);
1151 Volumes.getvolumemap (VOLUMEMAP);
1152 file[] roots = new File[storage.getnumstoragedirs ()]; 1153 for (int idx = 0; idx < storage.getnumstoragedirs (); idx++) {1154 Roots[idx] = Storage.getstoragedir (id x).Getcurrentdir ();
1155} 1156 Asyncdiskservice = new Fsdatasetasyncdiskservice (roots);
 1157 Registermbean (Storage.getstorageid ());
First read the number of configured storage directories and compare them to the number of storage directories that can be used to see if the set requirements are met. At the bottom we can see that each of the configured directories is managed using a Fsvolume object, and let's look at a specific thing that was done during the initialization of this Fsvolume object. Fsvolume is an internal class of fsdataset.

463   class Fsvolume {
 464     private File currentdir;
 465     private Fsdir datadir;
 466     Private File tmpdir;
 467     Private File Blocksbeingwritten;     Clients write here
 468     private File Detachdir.//Copy on write to blocks in snapshot
 469     private DF usage;
 470     private DU dfsusage;
 471     private long reserved;
474     fsvolume (File currentdir, Configuration conf) throws IOException {
 475         //per disk write point reserved space, Determines whether the DN writes each disk write point full
 476       this.reserved = Conf.getlong ("dfs.datanode.du.reserved", 0);
 477       This.datadir = new Fsdir (currentdir);
 478       this.currentdir = Currentdir;
 479       Boolean supportappends = Conf.getboolean ("Dfs.support.append", false);
 The       File parent = Currentdir.getparentfile ();
 481
 482       this.detachdir = new File (parent, "detach");
 483       if (detachdir.exists ()) {
 484         recoverdetachedblocks (Currentdir, detachdir);
 485       }
522       this.usage = new DF (parent, conf);
 523       this.dfsusage = new DU (parent, conf);
 524       This.dfsUsage.start ();
First look at the Fsvolume data members, Currentdir, Tmpdir, Blocksbeingwritten, and Detachdir, which correspond to the names of the directories below the configuration path mentioned above, respectively. One of the more interesting may be the datadir, usage, dfsusage, which have some strange types of members.

In the immediate constructor, there is the initialization of the several member variables. The first choice to see DataDir, from the code can be seen that it is a fsdir type of members, we look at a fsdir specific what to do. Fsdir is also an internal class of fsdataset.

 127 Public Fsdir (File dir) 128 throws IOException {129 = dir;
 130 This.children = null; 131 if (!dir.exists ()) {132 if (!dir.mkdirs ()) {throw new IOException ("Mkdirs to CR
 Eate "+ 134 dir.tostring ());
 135} 136} else {137 file[] files = fileutil.listfiles (dir);
 138 int numchildren = 0;             139 for (int idx = 0; idx < files.length; idx++) {140 if (files[idx].isdirectory ()) {141
 numchildren++;
 /Else if (Block.isblockfilename (Files[idx])) {143 numblocks++;
 144} 145} 146 if (Numchildren > 0) {147 children = new Fsdir[numchildren];
 148 int curdir = 0;               149 for (int idx = 0; idx < files.length; idx++) {151 if (files[idx].isdirectory ())
 Children[curdir] = new Fsdir (Files[idx]);  152             curdir++;
 153} 154}
The above code should be very easy to understand, this is the legendary implementation of the tree, but also the list implementation version (^_^).

Continue to look at the Fsvolume member variable usage,usage is the DF type object, DF is not fsvolume internal class, this looks strange naming actually corresponds to the Linux in the DF command, It is easy to see from the contained functions of DF that DF primarily holds the use of disk space for each configuration path, primarily when block writing is used to determine which configuration path The block should be written to.

Looking at member Dfsusage, which is a member variable of the DU type, it is obvious that this du corresponds to the du command in Linux, the main function is to get the current amount of space currently used, that is, the size of the stored content. What's more noteworthy is that the Start method

147 Public   void Start () {
148     //only start of the thread if the interval is sane
149     if (RefreshInterval & Gt 0) {       refreshused = new Thread (new Durefreshthread (),
151           "refreshused-" +dirpath);
152       Refreshused.setdaemon (true);
153       Refreshused.start ();
154     }   
In this method, a thread object refreshused is created, which is a background thread that is executed at all times, and the interval of execution time can be set. Executes the Run method of the parent shell of the calling Du, and the concrete execution shell command is

174   protected string[] getexecstring () {   
The command in.

The initialization of other Fsvolume constructors is mainly to deal with several files in the data directory, such as deleting the contents of TMP, and processing blocksbeingwritten data according to whether or not to accept append.

Now let's think about it, every nn write data will be written to a number of configuration paths, which is not easy to manage because there may be multiple paths. The following member is here to solve the problem,

1149     volumes = new Fsvolumeset (volarray);
1150     //For each volum under the processing, loading Volummap and SumIndex volumes.getvolumemap (VOLUMEMAP);
1151     Volumes.getvolumemap (VOLUMEMAP);
Each fsvolumeset manages all the fsvolume mentioned above (each fsvolume corresponds to a configuration path), and by logic we look at the main components of Fsvolumeset,

 750 synchronized Fsvolume Getnextvolume (long blockSize) throws IOException {762 int StarTV
 Olume = Curvolume; 763//Because a Datanode manages several paths, select a suitable path to store a block 764 while (true) {765 Fsvolume volume = Volumes[curvolum
 E];
 766 Curvolume = (curvolume + 1)% Volumes.length;
 767 if (volume.getavailable () > BlockSize) {return volume;} 768 if (Curvolume = = Startvolume) {769 throw new diskoutofspaceexception ("Insufficient space for an ad
 Ditional block "); 770} 771} 
 828 synchronized list<fsvolume> Checkdirs () {829 830 arraylist<fsvolume> = null;
 831 832 for (int idx = 0; idx < volumes.length; idx++) {833 Fsvolume FSV = Volumes[idx];
 834 try {835 fsv.checkdirs ();
 836} catch (Diskerrorexception e) {837 DataNode.LOG.warn ("Removing failed volume" + FSV + ":", e);
 838 if (Removed_vols = = null) {839 removed_vols = new arraylist<fsvolume> (1);
 840} 841 Removed_vols.add (VOLUMES[IDX)); 842 Volumes[idx].dfsusage.shutdown (); Shutdown the running DU thread 843 volumes[idx] = null; Remove the volume 844} 845} 846 847//repair array-copy non null elements 848 int RE Moved_size = (removed_vols==null)?
 0:removed_vols.size ();
 849 if (removed_size > 0) {850 fsvolume fsvs[] = new Fsvolume [volumes.length-removed_size];         851for (int idx=0,idy=0; idx<volumes.length; idx++) {852 if (Volumes[idx]!= null) {853 Fsvs[idy]
 = Volumes[idx];
 854 idy++; 855} 856} 857 volumes = Fsvs;
 Replace array of volumes

First look at the Getnextvolume method, the main role of this method is to loop through the various write paths of DN, through the Volume.getavailable method to get a path to write a given block. Where the Volume.getavailable method calls is the Getavailable method of the member variable usage of the Fsvolume mentioned above.

Then look at another method Checkdirs, the main role of this method is to check the configuration of each path corresponding to the Fsvolume, the disk problem path from the current fsvolumeset out. The methods used in the inspection are mainly Fsvolume checkdirs methods,

632     void Checkdirs () throws diskerrorexception {
 633       datadir.checkdirtree ();
 634       Diskchecker.checkdir (tmpdir);
 635       Diskchecker.checkdir (blocksbeingwritten);
 636     }
You can see that the checkdirtree that invoked the Fdir is actually the Diskchecker.checkdir method called in this method, just the directory tree that fsdir manage each path. Diskchecker.checkdir method mainly do three checks, check the path is not a directory, whether readable, can write.

Looking at another important method, Volumes.getvolumemap (Volumemap); In fact, this method is a Getvolumemap method for all fsvolume of Fsvolumeset management. Each Fsvolume object executes the Datadir.getvolumemap that manages its directory tree. So let's finally look at the Getvolumemap method of Fsdir,

 318 if (children!= null) {319 for (int i = 0; i < children.length; i++) {320
 Children[i].getvolumemap (Volumemap, volume);
 321} 322} 323 324 datainputstream = null;
 325//To a directory of all block file traversal, where you can read the Localindex 326 file blockfiles[] = Dir.listfiles (); 327 if (blockfiles!= null) {328 for (int i = 0; i < blockfiles.length; i++) {329 Datanod
 E.log.info ("filename is:" + blockfiles[i].getname ()); if (Block.isblockfilename (Blockfiles[i])) {331 DataNode.LOG.info ("in Getvolumemap:is BLOCKF
 Iles: "+ blockfiles[i].getname ());
 332 Long Genstamp = Fsdataset.getgenerationstampfromfile (blockfiles, 333 blockfiles[i]); 334 Volumemap.put (new block (blockfiles[i), Blockfiles[i].length (), 335 genstamp), new Datanod
 Eblockinfo (volume, blockfiles[i]); 336} 
You can first see a recursive call to each subdirectory, and then iterate over each file in the file directory, eventually creating a volumemap, Volumemap is a hashmap<block,datanodeblockinfo> where the block object stores the

   blockid private long;
 The   private long numbytes;
 The   generationstamp private long;
These are the meta information about each block, which is the information recorded in NN. Look at Datanodeblockinfo again,

Notoginseng   Private Fsvolume volume;       Volume where the block belongs   private file     file;         Block file   private boolean detached;      Copy-on-write Done as Block
This is the information on which fsvolume each block is specifically stored on the DN. This allows us to understand the role of Volumemap, because the client obtains from NN is the block object type information and wants to obtain the concrete data must know that blocks is in the DN the specific position, then Volumemap is plays such a mapping transformation the function.

Some of the more critical functions in this fsdataset have been introduced, and the Fsdataset is the key to managing NN data.

Finally, we welcome the exchange and criticism.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.