"Original" HDFs introduction

Last Update:2016-04-07 Source: Internet

Author: User

Tags create directory readfile

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction of HDFS

1. HDFs Full Name

Hadoop distributed filesystem,hadoop Distributed File system.

Hadoop has an abstract file system concept, and Hadoop provides an abstract class Org.apache.hadoop.fs.filessystem,hdfs is an implementation of this abstract class. Others are:

File system	URI Programme	Java implementation (Org.apache.hadoop )
Local	File	Fs. LocalFileSystem
Hdfs	Hdfs	Hdfs. Distrbutedfilessystem
Hftp	Hftp	Hdfs. Hftpfilessystem
Hsftp	Hsftp	Hdfs. Hsftpfilessystem
HAR	Har	Fs. Harfilesystem
KFS	Kfs	Fs.kfs.KosmosFilesSystem
Ftp	Ftp	Fs.ftp.FtpFileSystem

2. HDFs Features:

(1) Super large file data cluster

(2) Streaming data access mode read file

(3) The hardware requirements are not particularly high, there is a good fault-tolerant mechanism.

(4) There is a certain delay in data access because HDFS optimizes data throughput and is at the cost of increasing latency.

(5) HDFs cannot efficiently store large numbers of small files. Because Namenode limits the number of files.

(6) HDFs does not support multiple writers, nor does it support random writes.

Ii. HDFs Architecture

3. Architecture diagram

4. Introduction to Architecture

(1) HDFs consists of client, NameNode, DataNode, Secondarynamenode.

(2) The client provides a calling interface for the file system.

(3) Namenode consists of fsimage (HDFs metadata image file) and Editlog (HDFs file change log), Namenode in memory each file and data block reference relationship. The reference relationship in Namenode does not exist on the hard disk, and is re-constructed each time HDFs starts.

(4) There are two tasks for Secondarynamenode:

L Merge Fsimage and Editlog regularly and transfer them to Namenode.

L Provide hot backup for Namenode.

(5) A datanode is usually installed on a machine, and a datanode is divided into many chunks (blocks). The data block is the smallest addressable unit in HDFs, typically a block of size 64M, unlike a standalone file system, files with less than one block size do not occupy a whole block of space.

(6) The main reason is to reduce the address overhead, but the block settings can not be too large, because a map task to process a block of data, if the block is set too large, the map task processing the amount of data will be too large, resulting in a low efficiency.

(7) Datanode will send the stored file block information to Namenode through the heartbeat timing.

(8) The copy storage rules for HDFs

The default replica factor is 3, one copy exists on the local rack of the machine, the second copy is stored on the other machine in the local rack, and the third copy exists on one node of the other rack.

This reduces the network data transmission of the write operation, improves the writing operation efficiency, on the other hand, the rack error rate is far lower than the node error rate, so does not affect the data reliability.

Iii. HDFs reading and writing process

1. Data reading flowchart

2. Read Process description

(1) The HDFS client invokes the open () method of the Distributedfilesystem class and, through the RPC protocol request Namenode, determines the location of the requested file and finds the address of the nearest Datanode node.

(2) Distributedfilesystem will return a Fsdatainputstream input stream object to the client.

(3) The client will raise the read () function on the Fsdatatinputstream, reading from near to far in accordance with the distance of each datanode.

(4) After each datanode is read, the close () function is raised on Fsdatainputstream.

(5) If the read fails, a copy of the data block is read, and the message is reported to Namenode.

3, the file of the write flowchart

4. Write Process description

(1) The client invokes the Create () method of the Distributedfilesystem object, calls Namenode through the RPC protocol, creates a new file in the namespace, and has no associated datanode associated with it at this time.

(2) The Create () method returns a Fsdataoutputstream object that is used by the client to write data.

(3) Before writing the data, the file is split into a packet and placed in a "data queue".

(4) Namenode allocates the appropriate Datenode storage copy for the package and returns a Datanode pipeline.

(5) According to the pipeline to save the file package on each datanode.

(6) After each Datanode save the package, it will return the confirmation message, confirm that it is saved in the confirmation queue, and when all the Datanode in the pipeline return a successful confirmation message, it will be deleted from the confirmation queue.

(7) After all the Datanode in the pipeline are saved, call Close () of the FileSystem object to turn off the data flow.

Iv. page interface for Hadoop

1. Interface Address

The web interface of HDFs can be accessed through http://NameNodeIP:50070.

V. the Java API for HDFs

1. Use URL to read data

1 //To read a file in HDFs using the URL interface2 Static  {3Url.seturlstreamhandlerfactory (Newfsurlstreamhandlerfactory ());4 }5  PublicString gethdfsbyurl (string url)throwsmalformedurlexception,ioexception6 {7String str= "";8InputStream in =NULL;9OutputStream out=NULL;Ten     Try { Onein=Newurl (url). OpenStream (); A         //ioutils.copybytes (in,out,4096,false); -Str=out.tostring (); -     } the     finally { - Ioutils.closestream (in); - Ioutils.closestream (out); -     } +     returnstr; -}

2. FileSystem API Read Data

Readfile//url: "/user/hadoop/data/write.txt" public  String  ReadFile (string url) throws ioexception{    String filecontent= "";    Configuration conf = new configuration ();    FileSystem fs = Filesystem.get (conf);    Path PATH = new path (URL);    if (fs.exists (path)) {        Fsdatainputstream is = fs.open (path);        Filestatus status = Fs.getfilestatus (path);        byte[] buffer = new Byte[integer.parseint (string.valueof (Status.getlen ()))];        is.readfully (0, buffer);        Is.close ();        Fs.close ();        Filecontent=buffer.tostring ();    }    return filecontent;}

3. FileSystem API Create Directory

Create HDFs directory//dirpath: "/user/hadoop/data/20130709" public  void  MakeDir (String dirpath) throws IOException {    Configuration conf = new configuration ();    FileSystem fs = Filesystem.get (conf);    Path PATH = new Path (dirpath);    Fs.create (path);    Fs.close ();}

4. FileSystem API Write Data

HDFs Write file//fileurl: "/user/hadoop/data/write.txt" public  void  WriteFile (String fileurl,string filecontent) Throws ioexception{    Configuration conf = new configuration ();    FileSystem fs = Filesystem.get (conf);    Path PATH = new Path (fileurl);    Fsdataoutputstream out = fs.create (path);    Out.writeutf (filecontent);    Fs.close ();}

5. FileSystem API Delete File

Delete file//fileurl: "/user/hadoop/data/word.txt" public void  DeleteFile (String fileurl) throws ioexception{    Configuration conf = new configuration ();    FileSystem fs = Filesystem.get (conf);    Path PATH = new Path (fileurl);    Fs.delete (path,true);    Fs.close ();}

6. Query meta-data

Query file metadata public  void  showfilestatus (String fileUrl) throws  ioexception{    Configuration conf = new Configuration ();    FileSystem fs = Filesystem.get (conf);    Path file=new path (FILEURL);    Filestatus stat=fs.getfilestatus (file);        System.out.println ("File path:" +stat.getpath ());    System.out.println ("is directory:" +stat.isdirectory ());    System.out.println ("Whether it is a file:" +stat.isfile ());    SYSTEM.OUT.PRINTLN ("Size of Block:" +stat.getblocksize ());    System.out.println ("File owner:" +stat.getowner () + ":" +stat.getgroup ());    System.out.println ("File Permissions:" +stat.getpermission ());    System.out.println ("File Length:" +stat.getlen ());    System.out.println ("Number of Backups:" +stat.getreplication ());    SYSTEM.OUT.PRINTLN ("Modification Time:" +stat.getmodificationtime ());}

"Original" HDFs introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More