Datanode storage can be divided into two parts::
1. storage-related classes depict the organizational structure of each storage directory from a macro perspective, and manage DFS by HDFS attribute. data. directories and files specified by Dir, such as current, previous, detach, TMP, and storage, and operations related to the entire storage are defined;
2. dataset-related classes describe the organization of Block Files and their metadata files, such as the file organization structure in the current directory and operations on block files.
Because namenode also uses storage, and namenode does not store block files, the storage is divided into these two parts.
Storage
First, from the configuration of a datanode, the local data of datanode can be allocated to multiple disks. The specific configuration is as follows:
<Property>
<Name> DFS. Data. dir </Name>
<Value>/data/HDFS/dfs/data,/data/HDFS/dfs/data2 </value>
</Property>
/Data/HDFS/dfs/data,/data/HDFS/dfs/data2 stores all the data that can be managed by this datanode.
Enter a directory. The possible directory structure is as follows:
/Current/version
/BLK _ <id_1>
/BLK _ <id_1>. Meta
/BLK _ <id_1>
/BLK _ <id_1>. Meta
/...
/BLK _ <id_64>
/BLK _ <id_64>. Meta
/Subdir0/
/Subdir1/
/...
/Subdir63/
/Previous/
/Detach/
/Tmp/
/In_use.lock
/Storage
Here, we will roughly summarize the ing relationships in datanode:
Storagedirectory corresponds to each local storage path, such as/data/HDFS/dfs/Data
Fsdir corresponds to current and its subdirectory subdir *
Datastorage manages all local storage paths in a unified manner, that is, it manages storagedirectory and does not manage specific data files in the storage path.
Class diagrams related to datastorage
In the preceding class diagram, namespaceinfo is used to indicate the version number of the namespace in the entire HDFS cluster. This version is generated when the namenode node is formatted, datanode gets the version number of this namespace every time it is started to register with the namenode node. If datanode is started for the first time, the version number will be permanently saved. Otherwise, it compares the version number obtained by the first startup with the version number. if the version number does not match, the startup is terminated.
Datastorage mainly plays an important role when the datanode node starts. storagedirectory provides coarse-grained transactional operations, and the transactional operations on storagedirectory are completed by datastorage. The following operations are performed when datanode is started:
The recovertransitionread function constructs a storagedirectory for each storage path, analyzes the current status of each storage path, and performs operations on the abnormal storage path; after all the storage paths are in the correct state, you must specify the operations (backup, upgrade, rollback, recovery, and submission) for each storage path when the user starts the node, if all the storage path operations are successful, you need to update the storage version information under each storage path and the corresponding version file. the main process of recovertransitionread is as follows:
Void Recovertransitionread (namespaceinfo nsinfo, collection <File> datadirs, startupoption startopt) throws ioexception { // 1. For each data directory calculate its State and check whether all is consistent before transitioning. Format and recover. This . Storageid =" "; This . Storagedirs = New Arraylist <storagedirectory> (datadirs. Size (); arraylist <storagestate> datadirstates = New Arraylist <storagestate> (datadirs. Size ());For (Iterator <File> it = datadirs. iterator (); it. hasnext ();) // For each local storage path {File datadir = it. Next (); storagedirectory SD = New Storagedirectory (datadir); storagestate curstate; Try {Curstate = SD. analyzestorage (startopt ); // Analyze the status of the local storage path // SD is locked but not opened Switch (Curstate ){ Case Normal:Break ; Case Non_existent: // Ignore this storage Log.info (" Storage directory "+ Datadir +" Does not exist. "); It. Remove (); Continue ; Case Not_formatted: // Format Log.info (" Storage directory "+ Datadir +"Is not formatted. "); Log.info (" Formatting... "); Format (SD, nsinfo ); // Format Break ; Default : // Recovery part is common SD. dorecover (curstate ); // Restore to normal first }} Catch (Ioexception IOE) {SD. Unlock (); Throw IOE ;}// Add to the storage list Addstoragedir (SD); datadirstates. Add (curstate );} // 2. Do transitions each storage directory is treated individually. During sturtup some of them can upgrade or rollback // While others cocould be uptodate for the regular startup. For ( Int Idx = 0; idx <getnumstoragedirs (); idx ++) {dotransition (getstoragedir (idx), nsinfo, startopt ); // Perform upgrade, rollback, and submit operations on each local storage path in sequence } // 3. Update all storages. Some of them might have just been formatted. This . Writeall ();}
Four operations:
Format: Create a version File
Doupgrade: upgrade the system
Delete previous
Current-> previous. tmp
Previous. tmp do hard link to current
Write version files
Previous. tmp-> previous
Dorollback: rollback
Current-> removed. tmp
Previous-> current
Delete removed. tmp
Dofinalize: Submit the storage directory upgrade
Previous-> finalized. tmp
Delete finalized. tmp
Reference URL
Http://blog.csdn.net/xhh198781/article/details/7170087
Http://blog.jeoygin.org/2012/03/hdfs-source-analysis-3-datanode-storage.html
Http://caibinbupt.iteye.com/blog/283480