Introduction to Hadoop FileSystem

Source: Internet
Author: User

Introduction to Hadoop FileSystem

Before learning the Hadoop FileSystem module, the best advice is to first learn about the design and implementation of the Linux local file system, which will greatly help you understand Hadoop FileSystem. At least many ideas are common. In fact, to be honest, Hadoop FileSystem indeed integrates many advantages of many file systems, and there are still many points worth learning in design. The FileSystem mentioned here is not just HDFS, but its implementation. That is, FileSystem is an abstraction of all the file systems on it. It is indeed an abstract class.

FileSystem Introduction

Before introducing the Word file system, we must first introduce a concept: VFS (Virtual File System), a series of function interfaces for users, read (). write () and other common methods in the file system, but the user does not know which file system he is using, and the final processing is indeed different implementation subclass, it may be sub-File System 1, Sub-File System 2, or sub-File System 3. Therefore, this will create a virtual file system concept. The advantage of doing so is that the scalability is very strong and the interface-oriented mode. If you want to develop a file system under a certain requirement, the interface does not need to be changed. The following is a VFS model diagram:


And HadoZ? Http://www.bkjia.com/kf/ware/vc/ "target =" _ blank "class =" keylink "> cipher/crHsOfR3dfFyc/cipher/cjLvNLPtc2zyc + 1xLLZ1/e7udPQteO0 + cipher/s + cipher/cipher =" http://www.2cto.com/uploadfile/Collfiles/20141212/20141212092806117.jpg "alt = "\">

We should pay attention to several major InMemoryFileSystem memory file systems, but we recommend that you do not use them. LocalFileSystem also has the first local file system with ChecksumDistributeFileSystem with the checksum function. The full name class of HDFS we mentioned is DistributedFileSystem, which directly inherits FileSystem. The author put this class in the hdfs package and it is not in the Hadoop Common module.

FileSystem Package Structure

The structure of the fs package is summarized in the following figure. Since the code version I learned earlier supports not many File System subclasses, just a few of them should be noted:


FileSystem IO input/output system

The Design of IO Input/Output System Classes is very important, because this time it is associated with the subsequent file systems to rely on this operation, because the classes involved are indeed quite complex, I chose the form of a class chart, which is the most straightforward




Class diagram of the output stream:


I don't know why, I think there are few items in the input-related analogy, which is very symmetrical.

FileSystem file basic description

In such a large File system, the basic expression of the File is job. in java, we all know that it is stored in File, there are also a variety of File Operations methods, in the Hadoop system, is in a FileStatu File status class;

Public class FileStatus implements Writable, Comparable {// file Path, which contains the URI Uniform Resource Identifier private path Path; // file length private long length; // whether the directory is private boolean isdir; // Number of block replicas private short block_replication; // The block size private long blocksize; // the last file modification time private long modification_time; // The Last file access time is private long access_time; // The file read and write permissions, for the creator, user group, and other persons private FsPermission permission; // The file owner private String owner; // The private String group to which the object belongs ;.....
It contains a lot of metadata information about the file. Focus on two, Path, and FsPermission, one pipe Path, and one pipe permission:

Public class Path implements Comparable {/** The directory separator, a slash. */public static final String SEPARATOR = "/"; public static final char SEPARATOR_CHAR = '/'; public static final String CUR_DIR = ". "; static final boolean WINDOWS = System. getProperty ("OS. name "). startsWith ("Windows"); // contains the unified Uri URI uri Of The uri resource; // a hierarchical Uri...
Locate the file by using uri, the following FsPermission:

Public class FsPermission implements Writable {private static final Log LOG = LogFactory. getLog (FsPermission. class );.... // POSIX permission style // the user, user group, and others have different access permissions: private FsAction useraction = null; private FsAction groupaction = null; private FsAction otheraction = null ;....
As you can see, the file management permissions that he uses are basically the same as those in Linux. Different users, user groups, and others have different access permissions through rwx, the three-digit method indicates that this person who understands Linux File Permission management must be familiar with it, so he will not be embarrassed,

Public enum FsAction {// POSIX style // It is represented by 3 bits, representing eight read/write possibilities NONE ("---"), EXECUTE ("-- x "), WRITE ("-w-"), WRITE_EXECUTE ("-wx"), READ ("r --"), READ_EXECUTE ("r-x "), READ_WRITE ("rw-"), ALL ("rwx ");....
Summary

To learn about Hadoop FileSystem, we mainly need to learn its design and strive to be concise. To learn more about the implementation details of the file system, we need to select a deeper perspective.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.