Viewfilesystem of data merging scheme for HDFS cross-cluster

Source: Internet
Author: User
Tags comparison config key string split static class throw exception hadoop fs
Preface

In many cases, we will meet the needs of data fusion, such as the original a cluster, b cluster, later administrators think there are 2 sets of clusters, data access is not convenient, so try to merge a A, a cluster of a larger cluster, their data are placed on the same cluster. One way to do this is with Hadoop. DistCp tool to copy data across a cluster. Of course, this will bring a lot of problems, if the amount of data is very large. This article introduces another solution,Viewfilesystem, can be called view file system The main idea is to allow the different clusters to maintain the uniqueness of the view logic, and the different clusters between the various tubes. traditional Data consolidation Scenarios

In order to make a comparison, the following describes the data merge commonly used in the data merge practice, is the relocation data. In HDFs, for example, remote copies can also be thought of using the Distcp tool. While DISTCP itself is used to do this, the following problems arise as data volumes scale: 1. Copy period is too long, if the amount of data is very large, in the computer room total bandwidth is limited, the copy time will be very long. 2. The data in the process of copying, there will be the original data changes and changes, how to synchronize this data is also to be considered.

The above 2 points, is I think of the more prominent problem. OK, the following will be a grand introduction of the concept of viewfilesystem, may not be well known to many people. viewfilesystem: View File System

The preface also partly mentions the concept of Viewfilesystem, first of all to understand a core principle

Viewfilesystem is not a new file system, just a logical view file system, logically unique.

How to understand this sentence, viewfilesystem to help you do a thing

Mapping the true file paths of individual clusters to the newly defined paths of Viewfilesystem

The meaning of the above sentence is like the Mountin the file system, the meaning of mount. Further, Viewfilesystem will maintain a mount-table mount relationship table in each client, which is the pointing relationship of the view file system path, the cluster physical path. But in mount-table, of course, there's more than one relationship. , there will be many . For example, the many-to-one relationship shown below:

/user,          hdfs://nn1/containinguserdir/user
/project/foo,   hdfs://nn2/projects/foo
/ Project/bar-   Hdfs://nn3/projects/bar

The front is the path in Viewfilesystem, which is the true cluster path represented. So you can understand that the real thing for Viewfilesystem is the route resolution of the route. Here's a simple schematic:

How the configuration is used is given below. Viewfilesystem Internal Implementation principle

In the above we have basically learned that the role of Viewfilesystem is basically a route parsing role, the real request processing or in their own real cluster. This section explores how this " Route resolution " is implemented within Viewfilesystem. Role? Keep looking down. Directory mount points

The design of the mount point is very important because it is a route resolution. Let's look at how this class is defined in Viewfilesystem.

  Static public class Mountpoint {
    //source path
    private path src;       The src of the Mount
    //directory points to the path, which is the true path, which can be multiple
    private uri[] targets;//target of the  mount; Multiple targets imply Mergemount
    mountpoint (Path srcpath, uri[] targeturis) {
      src = srcpath;
      Targets = Targeturis;
    }
    Path getsrc () {
      return src;
    }
    Uri[] Gettargets () {
      return targets;
    }
  }

In general, the Mount node is one-to-one, but if there are different clusters with the same name directory, you can do a pair of more, in Hadoop is called Mergecount, but this feature is not yet complete, still in development, Related issueHADOOP-8298. analysis and storage of mount points

In the Viewfilesystem initialization operation, it is a very important process to resolve and store the mount point. The process execution is in the following variable:

Inodetree<filesystem> fsstate;  The FS state; ie the Mount table

Enter the initialize implementation of Viewfilesystem:

  public void Initialize (final URI Theuri, final Configuration conf)
      throws IOException {
    super.initialize (Theuri , conf);
    setconf (conf);
    config = conf;
    Now the build  client side view (i.e. client side mount table) from CONFIG.
    Final String authority = theuri.getauthority ();
    try {
      Myuri = new URI (fsconstants.viewfs_scheme, authority, "/", NULL, NULL);
      Incoming conf information for fsstate initialization
      fsstate = new inodetree<filesystem> (conf, authority) {
        ...

And then into the Inodetree construction method.

  Protected Inodetree (Final Configuration config, final String viewName)
      throws Unsupportedfilesystemexception, URISyntaxException,
    ...

    Final String Mtprefix = Constants.config_viewfs_prefix + "." + 
                            VName + ".";
    Final String Linkprefix = Constants.config_viewfs_link + ".";
    Final String Linkmergeprefix = Constants.config_viewfs_link_merge + ".";
    Boolean gotmounttableentry = false;
    Final Usergroupinformation Ugi = Usergroupinformation.getcurrentuser ();
    For (entry<string, string> si:config) {
      final String key = Si.getkey ();
      Determine if the source key name is the prefix fs.viewfs.mounttable start
      if (Key.startswith (Mtprefix)) {
        ...
        Gets the true path of the target map, possibly multiple, separated by ', ', the
        final String target = Si.getvalue ();//link or merge link
        createlink (src, target, I Smergelink, UGI); 
      }
    }
    ...

The store that really implements the mount point relationship is actually in the Createlink method, entering this method

  private void Createlink (final string src, final string target, Final
      boolean islinkmerge, Final usergroupinformation a Ugi)
      throws URISyntaxException, IOException,
    filealreadyexistsexception, unsupportedfilesystemexception {
    //Validate that SRC is valid absolute path
    final path Srcpath = new path (SRC); 
    if (!srcpath.isabsoluteandschemeauthoritynull ()) {
      throw new IOException ("Viewfs:non absolute mount name in config: "+ src);
    }

    Split the path you want to add to the '/' delimiter for
    final string[] srcpaths = breakintopathcomponents (src);
    Sets the current node as the root node
    inodedir<t> curinode = root;
    ...

Note the last line of code executed above, the Inodedir class appears, and the current Curinode is set to the root node. This is actually very useful . Continue to view the definition of the Inodedir class.

  /**
   * Internal class to represent a Internal dir of the Mount table
   * @param <T> * *
  static class in Odedir<t> extends Inode<t> {
    //child node
    final map<string,inode<t>> children = new hashmap& Lt String,inode<t>> ();
    File system associated with this mount directory
    T inodedirfs =  null;///File system of this internal directory of Mountt
    boolean isRoot = FA LSE;
    ...

From above, we can see here is a father-child relationship, and each directory will have its own target file system, and the child may also be inodedir or the inode of the sub-class. Before contacting the previous path according to the symbol '/' division, we can roughly push the measure, Viewfilesystem is the storage of mount points according to the way the writing structure is stored .

The following code basically verifies the conjecture at this point, and here is the closest directory tree to look for :

    ... int i; Ignore first initial slash, process all except last component//ignores the initial empty string, traversing each subsequent sub-segment for (i = 1; i < Srcpat Hs.length-1;
      i++) {//Gets the current sub-segment string, final string iPath = Srcpaths[i];
      Search from the current directory INode, inode<t> nextinode = curinode.resolveinternal (IPath); If it is not found, it means that the current node does not have the corresponding information under this path if (Nextinode = = null) {//new Targetfilesystem information for this path is added inodedir<
        t> newdir = Curinode.adddir (IPath, Augi);
        Newdir.inodedirfs = Gettargetfilesystem (newdir);
      And take this as the next node, that is, to find the target node nextinode = Newdir; }//If this node is already inodelink information, throw exception if (Nextinode instanceof Inodelink) {//error-expected a dir but got 
      A link throw new Filealreadyexistsexception ("Path" + Nextinode.fullpath + "already exists as link");
        } else {//If it is still an inode directory, use the subdirectory as the current directory and look down to assert (Nextinode instanceof Inodedir); Curinode = (inodedir<t>)Nextinode; }
    }
    ...

after locating the last layer of the directory tree, add the link association information for the new URI

So you can see that the final point of the file system and the specific information are in Inodelink. Then the location of all the mount directory points is taken apart by the tree-shaped key string. In other words, In the Viewfilesystem, enter a query path configured in the Viewfilesystem, which will be parsed to the corresponding Inodedir, and finally the corresponding Inodelink is taken out. The storage model diagram is as follows:

Parsing logic with this complete type is also a step-by-step search through Inodedir's storage relationship, which is not covered in space again. Viewfilesystem Request Handling

Now another problem arises, how does viewfilesystem handle various HDFS requests from clients? Take mkdir as an example:

  @Override Public
  Boolean mkdirs (Final Path dir, final fspermission permission)
      throws IOException {
    // Parse by Fsstate object
    inodetree.resolveresult<filesystem> res = 
      fsstate.resolve (Geturipath (dir), false);
    Get the target real file system for the corresponding request processing
   return  res.targetFileSystem.mkdirs (res.remainingpath, permission);
  }

here the fsstate.resolve will go to the previous mentioned Inodedir in the level of search. Once the corresponding file system is found, the path to the final function is passed as a parameter to the real file system .

The call flowchart for this procedure is as follows:

path wrapping of Viewfilesystem

Viewfilesystem as a view file system, to maintain the exact logical consistency, all the information about the return properties of the file, a layer of packaging and adaptation. For example:
I set the mount information in advance.

/project/viewfstmp-   Hdfs://nn1/projects/tmp

The former is my Viewfilesystem path, which is the real file system storage path. Assume that there are 3 sub-files in a real file system:

/projects/tmp/child1
/projects/tmp/child2
/projects/tmp/child3

In the case of Viewfilesystem, I use the command of Hadoop Fs-ls/project/viewfstmp to see that the information that appears should be like this

/projects/viewfstmp/child1
/projects/viewfstmp/child2
/projects/viewfstmp/child3

Because the drop-down information file path has been changed by me, everything will be installed in the path configured in Viewfilesystem. so this requires us to the real return of the filestatus to do a layer of packaging, corresponding to some size, modification time and other basic attributes of information, directly return to the original line, at that time for some path return, it is necessary to do a layer of modification .
And then the viewfsfilestatus is derived from this class.

Class Viewfsfilestatus extends Filestatus {
   //original Filestatus information
   final filestatus MYFS;
   Modified path information path
   Modifiedpath;
   Viewfsfilestatus (Filestatus FS, Path NewPath) {
     MYFS = FS;
     Modifiedpath = NewPath;
   }
   ...

In this class, the GetPath is changed:

   @Override Public
   Path GetPath () {
     //Overload returns the path method, returning the modified path information to return
     Modifiedpath;
   }

For other basic property methods, call the original

...
   @Override public short
   getreplication () {
     return myfs.getreplication ();
   }

   @Override public
   Long Getmodificationtime () {
     return myfs.getmodificationtime ();
   }

   @Override public
   Long Getaccesstime () {
     return myfs.getaccesstime ();
   }
...

So we have achieved the example we have described earlier. This, you can try the following yourself. Here is a diagram containing the diagram:

Viewfilesystem Performance Optimization

The Viewfilesystem performance tuning point here is primarily for operations where the most frequent resolve parse paths are performed. There can actually be a big performance boost here. As described in the previous article, the current viewfilesystem by splitting the path, Save to a tree structure, and then parse the same time need to split the path, to layer-by-level parsing. This method itself is not a problem, but in the case of a small number of Mount table, the efficiency is not very high, and frequent splitting, merging the string itself is not very fast operation . So in Viewfilesystem, a to-do statement was declared, Is the improvement that can be made in the future:

To does:-more efficient to don't split the path, but simply compare

That is , you can save directly to a map's relational structure, and then simply make a simple comparison of the strings, and then do not need to save anything complicated by the father-child structure, of course, this is less than the case for Mount Table records . In one step, we chose the directory map, Certainly not a lot of relationships, usually we mainly map a few top-level directories can be, so the general record number is not much. Of course, if this is to be changed, then the logic of adding link and parsing path will change. Use of Viewfilesystem

Finally, we briefly describe how some Viewfilesystem are configured for use. This is divided into the following steps.

First step: Create a Viewfs name
Configure the Fs.defaultfs property in Core-site.xml, such as the following

<property>
    <name>fs.defaultFS</name>
    <value>viewfs://multiplecluster</value >
</property>

Step Two: Add a mount relationship

  <property><name>fs.viewfs.mounttable.MultipleCluster.link./viewfstmp</name>
    <value >hdfs://nn1/tmp</value>
  </property>

The nn1 here is the real cluster path. Note that the name in the front of link in Fs.viewfs.mounttable.MultipleCluster.link must match the name defined in the previous VIEWFS.

After these 2 steps, the basic completion of the Viewfilesystem configuration, in fact, very simple.

Then you use the Fs-ls command to run this command separately in the pre-configured and configured clusters, and you will find that the subdirectory file information is exactly the same:
Before operation in the cluster where the NN1 is located

$ Hadoop fs-ls/tmp
Found 2 items

-rw-r--r--   2 data    supergroup  193488274 2016-04-13 14:21/tmp/ share.tar.gz
drwxr-xr-x   -Data    supergroup          0 2016-03-15 15:39/tmp/sparkjars

In the configured cluster:

$ Hadoop fs-ls/viewfstmp
Found 2 items

-rw-r--r--   2 data    supergroup  193488274 2016-04-13 14:21/ viewfstmp/share.tar.gz
drwxr-xr-x   -Data    supergroup          0 2016-03-15 15:39/viewfstmp/spark Jars

If you are using Hadoop fs-ls/tmp in the configured cluster, he will not be prompted to find the file.

$ hadoop fs-ls/tmp
ls: '/tmp ': No such file or directory

Because logically, this directory has been mounted to Viewfs's/viewfstmp. These mount messages are maintained in the client's memory and do not requirea restart of the Namenode,datanode.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.