HDFs Custom Small file analysis feature

Source: Internet
Author: User
Tags parent directory
Preface

After reading the title of this article, some readers may wonder: Why is HDFs linked to small file analysis? is Hadoop designed not to favor files that are larger in size than storage units? What is the practical use of such a feature? Behind this is actually a lot of content to talk about the small files in HDFs, we are not concerned about how small it is, But it's too much. And too many files because of its external program is too small to write to a single file, the same size to write the amount of data, the unit file is too small will obviously result in a large number of small file generation . Many people may not think much of the small file, But its overall data volume is still stable, so the impact on HDFs is not much. In this analysis, we tend to overlook the indirect effects of small files: The increase in the size of the metadata. This will directly aggravate the load on the Namenode. Because this causes Namenode to keep track of these blocks and to store the meta information of the blocks in their own memory. So here's the topic of this article: by transforming the HDFs small file analysis function, To handle small files in the cluster, and then achieve the goal of ultimately mitigating the namenode load. Background Knowledge

Before explaining the custom small file analysis feature, it is necessary to understand some of the relevant contents of existing HDFS, including the following 4 points: Namenode metadata storage Namenode memory too large impact existing namenode memory too large solution HDFs existing file analysis function namenode Meta data storage

Many users of Hadoop tend to lack of management of the files they write when they use them on a daily basis, resulting in a lot of small files, some even more than 10-byte files. This makes HDFs fully used as a small file system, making it difficult to exploit its own advantages . These massive metadata information, How can we see that, in one step, how can we find out how these metadata information accounts for the Namenode of memory? Here is a simple way to teach you the Jmap command, which is provided by Java, with the following command:

Jmap-histo:live  

Here the process ID is populated with the Namenode process ID. ( If you want to try it, it is recommended to do it on standby namenode because if the namenode memory is too large, the jmap time will be longer and will have an impact on the process service itself ). Here's what I do on the test cluster :

num #instances #bytes class name----------------------------------------------1 : xxxx xxxx org.apache.hadoop.hdfs.server.namenode.INodeFile 2:xxxx xxxx [Ljava. Lang.
   Object;           3:xxxx xxxx org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoContiguous 4:xxxx
   xxxx [B 5:xxxx xxxx [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfoContiguous;
   6:xxxx xxxx [lorg.apache.hadoop.util.lightweightgset$linkedelement; 7:xxxx xxxx org.apache.hadoop.hdfs.server.namenode.INodeDirectory 8:xxxx xxxx J           Ava.util.ArrayList 9:xxxx xxxx org.apache.hadoop.hdfs.protocol.HdfsFileStatus 10:xxxx xxxx org.apache.hadoop.ipc.retrycache$cacheentrywithpayload 11:xxxx xxxx <constmethodkla Ss> 

We mainly focus on the sorting of the above class objects, the specific values you can do in the cluster test, basically is Inodefile is the most, if you have tens other files in the cluster, The instances value of this object will also reach tens. Jinzhao is blockinfocontiguous, this object class is also very much, because it to save the location of the adjacent copy block information, Specifically, it's inside. The details of the previous copy and the next copy of the current copy block are saved. Of course, we also see the Inodedirectory object here, which corresponds to the meta-information of the directory. Finally here, I want to correct a misunderstanding that many people might have about the HDFs memory metadata:

Many people may feel that a new file in HDFs, for Namenode, is to add a few K file meta-information things, so add a 100w what is not a big problem, but we only see the increase in the appearance of data, its potential associated object data is often ignored by us, such as blockinfocontiguous objects. Namenode The effect of too much memory

Too much namenode metadata can cause it to use too much memory, and Namenode's memory can easily trigger its GC operation, which further affects the response speed of the entire cluster file processing. In particular, when simultaneously requesting large-scale file volumes, Sometimes you may find that the response of Namenode is very slow. existing Namenode memory too big solution

If Namenode memory has reached a very large value, what are our existing solutions? Here's the answer:

Use HDFs Federation to share the pressure of a single Namenode storage meta-data with multiple namespace

and a simple implementation of HDFS federation can be done with VIEWFS, VIEWFS related content can see my other article HDFs cross-cluster data consolidation scheme Viewfilesystem. but then again, if we do not deal with the problem of the source of small files, no amount of namespace also useless, namenode memory will continue to grow rapidly, and finally reach a bottleneck point. HDFs Existing file analysis feature

In the existing HDFs file analysis function does not exist small file analysis function, currently with the small file analysis function is closest to the Filedistribution processor under the HDFs OIV command function. You can understand it as a file classifier feature. It works as follows:

By offline analysis of the Namenode fsimage file, the parsed file is counted according to the size interval, then the statistic results are output.

The range and number of intervals is determined by the maximum value of the MaxSize file passed in by the user and by the range size value of each interval of the step. For example, we set the maxsize to 10m,step for 2m, then the divided interval will be divided into 5 + 1 parts, +1 is because 0-0 more counted as a copy, The interval is 0-0,0-2,2-4, and then to 8-10, more than 10m is counted in the last interval, that is, 8-10m. There are other parameter commands under the HDFs OIV command, which can be used in practice because the image file is analyzed offline, So there is no effect on the namenode of the line. Because the function of the Filedistribution processor is closest to the small file analysis function We want, we can transform it on the basis of this command implementation. As for how the small file function is modified, we continue to look at the following content. HDFs Small file analysis function principle Design

In this section, let's talk about the principle design of the small file analysis function, how do we implement it? First, we need to be clear about a principle:

The small file analysis function is only analyzed, the analysis is not processed, the analysis results will help the subsequent processing.

Because the analysis of the small file is HDFs existing business data files, direct processing is obviously not a reasonable approach, the results of the analysis is used to help make subsequent improvements and optimizations. So in this transformation function, We only need to output the information we think is a small file. But we also need to figure out some of the details, such as the following: 1. How do we ensure that the functionality of the transformation does not affect the functionality of the original command? 2. How do we define the standard of small files, and how many sizes of files are we think of as small files? 3. Can we just count the number of small files we want under the path, rather than the full scan?

The following I combined with their subsequent implementation, this is one by one of the elaboration:

The first is the 1th, we can add the form of parameters to ensure that the original function, I filedistribution processor parameters under the new -printsmallfiles parameter on behalf of the user want to perform additional small file analysis function.

2nd, we can define a file in the first range except the 0 interval including the 0 interval, which is the minimum interval, as a small file, for example, if we define step as 1m, then the 0-1m size range (including file size 0) We think is small file, then we output the file information.

3rd, specify the function of the path analysis, which we can do in the internal analysis of statistics when the file path matching operation, if the path of the current analysis file does not match the command parameters specified by the path, then skip directly. in the following implementation, I use the-prefixpath parameter as the path parameter to be matched, followed by one or more paths to be analyzed, separated by commas . HDFs Small file analysis feature implementation

If you have fully understood the design principle in the previous section, then the part of the code implementation will look very simple, here I do a simple introduction, in the end of the article will give a hadoop-2.7.1-based patch and the full. Java file code. Initialization Section

The first is the resolution of our newly added 2 parameters, in the OFFLINEIMAGEVIEWERPB class, we first add the definition of 2 new parameters:

  private static Options buildoptions () {
    options options = new options ();

    ...
    Options.addoption ("MaxSize", True, "");
    Options.addoption ("Step", True, "");
    Added 2 parameters to the definition, the subsequent can be resolved to
    options.addoption ("Prefixpath", True, "");
    Options.addoption ("Printsmallfiles", False, "");
    Options.addoption ("addr", True, "");
    ...

    return options;
  }

Then the parsing of the command parameters:

  public static int run (string[] args) throws Exception {Options options = Buildoptions ();

    ...
    Configuration conf = new configuration ();
        Try (PrintStream out = Outputfile.equals ("-")? System.out:new PrintStream (OutputFile, "UTF-8")) {switch (processor) {case "filedistribution": Lon
        G maxSize = Long.parselong (Cmd.getoptionvalue ("MaxSize", "0"));
        int step = Integer.parseint (Cmd.getoptionvalue ("step", "0"));
        New parameters are parsed and passed into the Calculator object within the String Prefixpath = Cmd.getoptionvalue ("Prefixpath", "" ");
        Boolean printsmallfiles = Cmd.hasoption ("Printsmallfiles"); Out.println ("MaxSize" + MaxSize + ", Step:" + step + ", Printsmallfiles:" + Printsmallfiles + ", Prefixpath
        : "+ Prefixpath); New Filedistributioncalculator (conf, maxSize, step, out, Printsmallfiles, Prefixpath). Visit (New RANDOMACCESSFI
        Le (Inputfile, "R"));
        Break ...

Next, we go inside the Filedistributioncalculator class, where I have added several new variables:

Final class Filedistributioncalculator {
  private final static long Max_size_default = 0x2000000000l;//1/8 TB = 2^37< C1/>private final static int interval_default = 0x200000; 2 MB = 2^21
  private final static int max_intervals = 0x8000000;//M = 2^27
  ...

  Whether Print Small file infos
  //Whether the output small file information marker
  private Boolean printsmallfiles;
  Target matching character list
  private string[] prefixstrs;
  Small file information count to information
  private hashmap<string, integer> Smallfilesmap;
  Metadata memory stores information object
  private inmemorymetadatadb metadatamap = null;
  ...

The above variables are then assigned in their construction methods:

  Filedistributioncalculator (Configuration conf, long maxSize, int steps,
      printstream out, Boolean printsmallfiles, String prefixpath) {
    this.conf = conf;
    This.maxsize = MaxSize = = 0? Max_size_default:maxsize;
    ...
    This.printsmallfiles = Printsmallfiles;
    This.smallfilesmap = new hashmap<string, integer> ();

    if (This.prefixpath! = null && this.prefixPath.length () > 0) {
      out.println ("Prefixpath:" + This.prefixpat h);
      This.prefixstrs = This.prefixPath.split (",");
      Out.println ("Prefixstrs:" + prefixstrs);
    }

    if (this.printsmallfiles) {
      this.metadatamap = new inmemorymetadatadb ();
    }
  }
Transformation of fsimage analytic process

in the existing HDFS OIV filedistribution resolution mirroring process, only resolves to the specific file related information, and does not have its parent directory information, that is, we will not get the file full path information, so we want to do a small amount of transformation .

By referencing the full parsing process of fsimage, I have added the parsing code for the following directory-related information and saved it in the memory object (this process will add an extra length to the parsing time of the image file):

  void visit (randomaccessfile file) throws IOException {if (!
    Fsimageutil.checkfileformat (file)) {throw new IOException ("Unrecognized fsimage");
    } filesummary Summary = fsimageutil.loadsummary (file); Try (FileInputStream in = new FileInputStream (FILE.GETFD ())) {//Newly added parsing code, parsing directory-related information, if no incoming parsing small file feature identity, then this process will be ignored/ /If we want to print small files info, we should load directory Inodes If (printsmallfiles) {immutablelist
        <Long> refidlist = null;  For (Filesummary.section section:summary.getSectionsList ()) {if (Sectionname.fromstring (Section.getname ()) = =
            Sectionname.inode_reference) {In.getchannel (). Position (Section.getoffset ()); InputStream is = fsimageutil.wrapinputstreamforcompression (conf, summary. GETCODEC (),
            New Bufferedinputstream (New Limitinputstream (in, Section.getlength ())); Load inodereference So, all INodes can be processed.
            Snapshots is not handled and would just be ignored for now.
            Out.println ("Loading inode references");
          Refidlist = Fsimageloader.loadinodereferencesection (IS); }} for (Filesummary.section s:summary.getsectionslist ()) {if (Sectionname.fromstring (S.GETN
            Ame ()) = = Sectionname.inode) {In.getchannel (). Position (S.getoffset ()); InputStream is = fsimageutil.wrapinputstreamforcompression (conf, summary. GETCODEC (),
            New Bufferedinputstream (New Limitinputstream (in, S.getlength ()));
          Loaddirectoriesininodesection (IS); }} for (Filesummary.section s:summary.getsectionslist ()) {if (Sectionname.fromstring (S.GETN
            Ame ()) = = Sectionname.inode_dir) {In.getchannel (). Position (S.getoffset ()); InputStream is = fsimageutil.wrapinputstreamforcomPression (conf, summary. GETCODEC (), New Bufferedinputstream (New Limitinputstream (
            In, S.getlength ())));
          Buildnamespace (is, refidlist);
  }
        }
      }
      ...

    } }
filedistribution statistic Count Reconstruction

The conversion of statistical count is mainly in the Run method, the code is as follows:

  private void Run (InputStream in) throws IOException {
    inodesection s = inodesection.parsedelimitedfrom (in);
    for (int i = 0; i < s.getnuminodes (), ++i) {
      Inodesection.inode p = INodeSection.INode.parseDelimitedFrom (in);
      if (p.gettype () = = INodeSection.INode.Type.FILE) {
        ...

        Determine if you need to output a small file ID and whether it is the first 2 minimum interval
        if (printsmallfiles && (bucket = = 1 | | bucket = = 0)) {
          //Small file Count
          incr Easesmallfilescount (Prefixstrs, P.getid (), P.getname ()
              . ToStringUtf8 ());
        }

      } else if (p.gettype () = = INodeSection.INode.Type.DIRECTORY) {
        ++totaldirectories;
      }
      ...
    }
  }

Here again into the increasesmallfilescount interior,

  private void Increasesmallfilescount (string[] prefixpaths, long nodeId, String pathstr) {int count = 0;
    String Parentpath = "";
    For small file counts, we only need to store the parent directory corresponding to the file, try {Parentpath = Metadatamap.getparentpath (nodeId);
    } catch (Exception e) {//TODO auto-generated catch block E.printstacktrace ();
    } Boolean isMatch = false;
    if (prefixpaths = = NULL | | prefixpaths.length = = 0) {IsMatch = true; } else {//for path matching, just filter out the path we want to match for (String str:prefixpaths) {if (str! = NULL && str.lengt
          H () > 0 && parentpath.startswith (str)) {IsMatch = true;
        Break
    }}}//Judge If the Parentpath match the target Prefixpath if (!ismatch) {return;
    }//Remove existing count if (!smallfilesmap.containskey (Parentpath)) {count = 0;
    } else {count = Smallfilesmap.get (Parentpath);
    }//Update count count++; Smallfilesmap.put (PARENTPAth, count); }

It is important to point out here that we do not need to save the actual file-count relationship when we store the statistics, because that will result in too many records, and the file will basically not have the same name, and the policy to save its parent directory is used here . What is the number of small files in this directory? test Result Output

Here are the results of my test on the test cluster:

Enter the test command:

HDFs oiv-p filedistribution-printsmallfiles-prefixpath/user-i inputfile-o outputfile

Test output results:

The full code of the HDFs small file analysis function you can click the source link at the end of the article, there are hadoop-2.7.1-based patches and complete class files, you can use git apply to your own code . optimization of the process of filedistribution processing

In the process of implementing the small file analysis function, I made a 2-point optimization of the filedistribution process,

First, to solve the filedistribution statistics file distribution occasionally occur when the array out of bounds, this cross-border exception will directly lead to the current parsing process abort, which for the user, the experience is very bad . I avoided this exception by adding the following code:

private void Run (InputStream in) throws IOException {
         maxfilesize = Math.max (fileSize, maxfilesize);
         Totalspace + = FileSize * F.getreplication ();
        ...
        int bucket = fileSize > maxSize? Distribution.length-1: (int) Math
            . Ceil ((double) filesize/steps);
        if (bucket >= distribution.length) {
          bucket = distribution.length-1;
          OUT.PRINTLN ("Bucket index is out of index, fileSize:" + fileSize
              + ", Step:" + steps);
        }

This exception generally occurs when maxsize cannot divide the full step value , the detailed scenario describes the issue HDFS-10691 that I submitted to this submission, and I also provided patches that have been accepted by the community.

Second, optimize the statistic result output, enhance its readability. Current Filedistribution processor output results The first feeling is not very well understood, as follows:

Size    numfiles
0       3982
2097152 444827
4194304 877
6291456 587
8388608 216
10485760        315
12582912
14680064        215 16777216        108
18874368        84
...

It shows the size of the byte and the corresponding count value, and I transform it into a representation of the interval and convert the byte size into a readable format , as follows:

Size    numfiles
(0, 0]  3937
(0, 2 m]        447105
(2 m, 4 m]      795
(4 m, 6 m]      530
(6 m, 8 m ]      171
(8 m, ten m]     305
(ten M, M]
...

For the user, the above display is much more friendly than the original. RELATED LINKS

1.https://issues.apache.org/jira/browse/hdfs-10691
2. Implement Source Link: https://github.com/linyiqun/open-source-patch/tree/master/hdfs/others/HDFS-OIV-PrintSmallFiles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.