[Read hadoop source code] [8]-datanode-datastorage

Source: Internet
Author: User

Datanode storage can be divided into two parts::

1. storage-related classes depict the organizational structure of each storage directory from a macro perspective, and manage DFS by HDFS attribute. data. directories and files specified by Dir, such as current, previous, detach, TMP, and storage, and operations related to the entire storage are defined;

2. dataset-related classes describe the organization of Block Files and their metadata files, such as the file organization structure in the current directory and operations on block files.

Because namenode also uses storage, and namenode does not store block files, the storage is divided into these two parts.

Storage

First, from the configuration of a datanode, the local data of datanode can be allocated to multiple disks. The specific configuration is as follows:

<Property>
<Name> DFS. Data. dir </Name>
<Value>/data/HDFS/dfs/data,/data/HDFS/dfs/data2 </value>
</Property>

/Data/HDFS/dfs/data,/data/HDFS/dfs/data2 stores all the data that can be managed by this datanode.

Enter a directory. The possible directory structure is as follows:

/Current/version
/BLK _ <id_1>
/BLK _ <id_1>. Meta
/BLK _ <id_1>
/BLK _ <id_1>. Meta
/...
/BLK _ <id_64>
/BLK _ <id_64>. Meta
/Subdir0/
/Subdir1/
/...
/Subdir63/
/Previous/
/Detach/
/Tmp/
/In_use.lock
/Storage

 

Here, we will roughly summarize the ing relationships in datanode:

Storagedirectory corresponds to each local storage path, such as/data/HDFS/dfs/Data

Fsdir corresponds to current and its subdirectory subdir *

Datastorage manages all local storage paths in a unified manner, that is, it manages storagedirectory and does not manage specific data files in the storage path.

Class diagrams related to datastorage

In the preceding class diagram, namespaceinfo is used to indicate the version number of the namespace in the entire HDFS cluster. This version is generated when the namenode node is formatted, datanode gets the version number of this namespace every time it is started to register with the namenode node. If datanode is started for the first time, the version number will be permanently saved. Otherwise, it compares the version number obtained by the first startup with the version number. if the version number does not match, the startup is terminated.

Datastorage mainly plays an important role when the datanode node starts. storagedirectory provides coarse-grained transactional operations, and the transactional operations on storagedirectory are completed by datastorage. The following operations are performed when datanode is started:

The recovertransitionread function constructs a storagedirectory for each storage path, analyzes the current status of each storage path, and performs operations on the abnormal storage path; after all the storage paths are in the correct state, you must specify the operations (backup, upgrade, rollback, recovery, and submission) for each storage path when the user starts the node, if all the storage path operations are successful, you need to update the storage version information under each storage path and the corresponding version file. the main process of recovertransitionread is as follows:

Void Recovertransitionread (namespaceinfo nsinfo, collection <File> datadirs, startupoption startopt) throws ioexception { // 1. For each data directory calculate its State and check whether all is consistent before transitioning. Format and recover.      This . Storageid ="  "; This . Storagedirs = New Arraylist <storagedirectory> (datadirs. Size (); arraylist <storagestate> datadirstates = New Arraylist <storagestate> (datadirs. Size ());For (Iterator <File> it = datadirs. iterator (); it. hasnext ();) // For each local storage path {File datadir = it. Next (); storagedirectory SD = New Storagedirectory (datadir); storagestate curstate; Try {Curstate = SD. analyzestorage (startopt ); // Analyze the status of the local storage path              // SD is locked but not opened              Switch (Curstate ){ Case Normal:Break ; Case Non_existent: // Ignore this storage Log.info (" Storage directory "+ Datadir +" Does not exist. "); It. Remove (); Continue ; Case Not_formatted: // Format Log.info (" Storage directory "+ Datadir +"Is not formatted. "); Log.info (" Formatting... "); Format (SD, nsinfo ); // Format                      Break ; Default : // Recovery part is common SD. dorecover (curstate ); // Restore to normal first }} Catch (Ioexception IOE) {SD. Unlock (); Throw IOE ;}// Add to the storage list Addstoragedir (SD); datadirstates. Add (curstate );} // 2. Do transitions each storage directory is treated individually. During sturtup some of them can upgrade or rollback      // While others cocould be uptodate for the regular startup.      For ( Int Idx = 0; idx <getnumstoragedirs (); idx ++) {dotransition (getstoragedir (idx), nsinfo, startopt ); // Perform upgrade, rollback, and submit operations on each local storage path in sequence } // 3. Update all storages. Some of them might have just been formatted.     This . Writeall ();}

 

 

Four operations:

Format: Create a version File

Doupgrade: upgrade the system

Delete previous

Current-> previous. tmp

Previous. tmp do hard link to current

Write version files

Previous. tmp-> previous

Dorollback: rollback

Current-> removed. tmp

Previous-> current

Delete removed. tmp

Dofinalize: Submit the storage directory upgrade

Previous-> finalized. tmp

Delete finalized. tmp

 

Reference URL

Http://blog.csdn.net/xhh198781/article/details/7170087

Http://blog.jeoygin.org/2012/03/hdfs-source-analysis-3-datanode-storage.html

Http://caibinbupt.iteye.com/blog/283480

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.