Data balancing between different DFS. Data. dir nodes in hadoop

Source: Internet
Author: User

Problem: The storage data in the cluster increases, so that the datanode space is almost full (previously DFS. Data. dir =/data/HDFS/dfs/data), and the hard disk monitoring of the machineProgramNon-stop alarm.

A storage hard disk is doubled for each machine (new DFS. data. dir =/data/HDFS/dfs/data,/data/HDFS/dfs/data2 new hard disks are mounted on/data/HDFS/dfs/data2), but now the problem arises, the disk with data in the past is still full, and an alarm is still being triggered. How can we balance the data on these two disks ??


Solution: Move the data in one folder (Block) to another file.

Principle: inode file tree information of files in HDFS and block information of each file are stored in NN, and each block is stored on those DN machines when the cluster is started, each datanode is reported based on its own block. When datanode is started, it also scans its own DFS. data. the current under each folder of Dir (the premise is that the version of this directory and other necessary information files exist, this directory is valid) directory, then, report the following block information (which blocks exist in the folder) to NN (for details, see fsdataset of datanode ).Code).


1. Stop the cluster.

2. Modify the DFS. Data. dir configuration.

3. Start the cluster (only start HDFS first). The purpose of this step is to allow datanode to format/data/HDFS/dfs/data2 and fill in some system information files (for example: current, current/version, detach, storage, etc ).

5. Use http: // namenodeaddress: 50070/fsck to check the file system and record the results so that it can be compared with the modified fsck to check whether the file system is sound.

4. Stop the cluster.

5. Go to the/data/HDFS/dfs/data/current directory and place some of the larger subfolders (if the name generated by the system is subdir **) mV to/data/HDFS/dfs/data2/current.

6. Start the cluster (HDFS first for better check ).

7. Run the http: // namenodeaddress: 50070/fsck command again to compare the result with the previous one. If no problem occurs, it should be the same.


PS: a corresponding experiment was conducted on the dev machine to back up the dev data file. Now the dev cluster has been started for a while and there is no problem.

Source codePart:

In datanode, each DFS. data. the dir folder corresponds to a fsdir class, and each subfolders in the file corresponds to a fsdir. Each time datanode is started. data. the dir file starts a fsdir object, so that it will calculate which blocks are stored under the fsdir In the constructor, the folder exists only when datanode is started. data. DIR is the principle of this modification.

     Public Fsdir (File DIR)  Throws  Ioexception {  This . Dir = Dir;  This . Children = Null  ;  //  If the folder does not exist, create a folder.  //  Does not overwrite or delete existing folders, providing convenience for manual block movement.        If (!Dir. exists ()){  If (! Dir. mkdirs ()){  Throw   New Ioexception ("mkdirs failed to create" + Dir. tostring ());}}  Else  {File [] files = Dir. listfiles ();  Int Numchildren = 0;  //  Self-check folder count (number of sub-fsdir) and file (number of blocks)          For ( Int Idx = 0; idx <files. length; idx ++ ){  If  (Files [idx]. isdirectory () {numchildren ++ ;}  Else   If  (Block. isblockfilename (files [idx]) {numblocks ++ ;}} //  Create an fsdir object for each of its subfolders.          If (Numchildren> 0 ) {Children = New  Fsdir [numchildren];  Int Curdir = 0 ;  For ( Int Idx = 0; idx <files. length; idx ++ ){  If  (Files [idx]. isdirectory () {children [curdir] = New  Fsdir (files [idx]); curdir ++ ;}}}}} 


Code for adding a block to fsdir:

 /**  * // Do the following: Rename the block file in TMP and Its Meta File mV to current. * The first call is to call file = addblock (B, SRC, false, false), and then call addblock (B, SRC, true, true) * Boolean createok: whether to create a child fsdir in the Child fsdir, if the sub-directories are not available (the Sub-directories are full), you can create sub-directories in the sub-directories. * Boolean resetidx, false indicates that the next time the last storage is continued  */      Private File addblock (Block B, file SRC,Boolean  Createok,  Boolean Resetidx) Throws  Ioexception {  //  The directory at the local level is not full, which directly adds the block to the local directory.        If (Numblocks < Maxblocksperdir) {file dest = New  File (Dir, B. getblockname (); file metadata = Getmetafile (SRC, B); file newmeta =Getmetafile (DEST, B );  If (! Metadata. renameto (newmeta) |! SRC. renameto (DEST )){  Throw   New Ioexception ("cocould not move files for" + B + "from TMP to" + DeST. getabsolutepath ());}  If  (Datanode. log. isdebugenabled () {datanode. log. debug ( "Addblock: Moved" + metadata + "to" + Newmeta); datanode. log. debug ( "Addblock: Moved" + SRC + "to" +DEST);} numblocks + = 1 ;  Return  DeST ;}  //  To save it to a subdirectory, select a new subdirectory.  //  Through this resetidx, the storage of each sub-fsdir is not balanced on average.        If (Lastchildidx <0 && Resetidx ){  //  Reset so that all children will be checked Lastchildidx =Random. nextint (children. Length );}  //  Start from the preceding lastchildidx and find an item that can be added.        If (Lastchildidx> = 0 & children! = Null  ){  //  Check if any child-tree has room for a block.          For ( Int I = 0; I <children. length; I ++ ){  Int Idx = (lastchildidx + I) %Children. length;  //  To add a sub-directory, try not to create a sub-directory in the sub-directory. File file = children [idx]. addblock (B, SRC, False  , Resetidx );  If (File! = Null  ) {Lastchildidx = Idx;  Return  File ;}} lastchildidx =-1 ;} If (! Createok ){  Return   Null  ;}  //  If there is no sub-folder to do this, when a sub-file is manually moved, the result is that only the Several folders that are manually moved in the directory at this level ~~
// If there is no manual component, datanode will be created at one timeMaxblocksperdirSub-folder.
// Maxblocksperdir is determined by DFS. datanode. numblocks. The default value is 64, which indicates the maximum number of blocks and the maximum number of sub-blocks in each fsdir.Fsdir
 If (Children = Null | Children. Length = 0) {Children = New  Fsdir [maxblocksperdir];  For ( Int Idx = 0; idx <maxblocksperdir; idx ++ ) {Children [idx] = New Fsdir ( New File (Dir, datastorage. block_subdir_prefix + Idx ));}}  //  Now pick a child randomly for creating a new set of subdirs. Lastchildidx =Random. nextint (children. Length );  Return Children [lastchildidx]. addblock (B, SRC, True , False  );} 
Maxblocksperdir is determined by DFS. datanode. numblocks. The default value is 64, which indicates the maximum number of blocks stored in each fsdir and the maximum number of sub-blocks.Fsdir

Paste the above Code to clear the files (folders) You added when you are afraid that datanode has a set of mechanisms at the beginning, or clear the subfolders that do not comply with the rules (generally, when creating subfolders, the 64 folders subdir0 -- subdir63 will be created at one time, and the code can be found in the second part of the Code, however, after reading the code, I found that these logics were not found.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.