HDFs Source Analysis First bomb

Source: Internet
Author: User

1. HDFs definition

HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and datanodes that store the actual Data.

2. HDFs Architecture

3. HDFs instance

As a file system, the reading and writing of files is the core:

/*** Licensed to the Apache software Foundation (ASF) under one * or more contributor license agreements.  See the NOTICE file * Distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * under the Apache License, Version 2.0 (The * "License");  You are not a use of this file except in compliance * with the License. Obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0* * Unless required by applicable or agreed to writing, software * Distributed under the License is distribute D on ' As is ' BASIS, * without warranties or CONDITIONS of any KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ImportJava.io.File;Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.FSDataInputStream;ImportOrg.apache.hadoop.fs.FSDataOutputStream;ImportOrg.apache.hadoop.fs.Path; Public classHadoopdfsfilereadwrite {Static voidusage () {System.out.println ("Usage:hadoopdfsfilereadwrite <inputfile> <output file>"); System.exit (1); }    Static voidprintandexit (String str) {System.err.println (str); System.exit (1); }   Public Static voidMain (string[] argv)throwsIOException {Configuration conf=NewConfiguration (); FileSystem FS=filesystem.get (conf); if(Argv.length! = 2) usage (); //Hadoop DFS deals with PathPath InFile =NewPath (argv[0]); Path OutFile=NewPath (argv[1]); //Check If Input/output is valid    if(!fs.exists (InFile)) Printandexit ("Input File not Found"); if(!Fs.isfile (InFile)) Printandexit ("Input should be a file"); if(Fs.exists (outFile)) Printandexit ("Output already exists"); //Read from and write to new fileFsdatainputstream in =Fs.open (InFile); Fsdataoutputstream out=fs.create (OutFile); byteBuffer[] =New byte[256]; Try {      intBytesread = 0;  while((bytesread = in.read (buffer)) > 0) {out.write (buffer,0, Bytesread); }    } Catch(IOException e) {System.out.println ("Error while copying file"); } finally{in.close ();    Out.close (); }  }}

In the example above, copy the contents of one file into another file, with the following steps:

The first step: Create a file system instance and pass the new configuration to the instance.

New= filesystem.get (conf);

Step two: Get the file path

 //  Hadoop DFS deals with path  path    InFile = new  Path (argv[0 = new  Path (argv[1 //  Check if Input/output is valid  if  (! fs.exists (InFile) printandexit ( "Input File not found" );  if  (! "Input should be a file"  if   (Fs.exists (outFile)) Printandexit ( /span> "Output already exists"); 

Step three: Open the file input and output stream and write the input stream to the output stream:

    //Read from and write to new fileFsdatainputstream in =Fs.open (InFile); Fsdataoutputstream out=fs.create (OutFile); byteBuffer[] =New byte[256]; Try {      intBytesread = 0;  while((bytesread = in.read (buffer)) > 0) {out.write (buffer,0, Bytesread); }    } Catch(IOException e) {System.out.println ("Error while copying file"); } finally{in.close ();    Out.close (); }

The above file read and write functions involve file system filesystem, configuration file config, input stream/output stream Fsdatainputstream/fsdataoutputstream

4. Basic Concept Analysis

4.1 File System

The hierarchical structure of the file system is as follows:

The file system has two important branches: a Distributed File system and a "local" (mapped to locally attached disk) file system, where local disks are suitable for less hadoop instances and tests. In most cases, distributed file systems are used, and Hadoop Distributed file systems use multiple machine systems, but only one disk for the user. Its fault tolerance and high capacity make it very useful.

4.2 Configuration Files

The hierarchy of the configuration file is as follows:

Our focus is on hdfsconfiguration, which has Hdfs-default.xml and hdfs-site.xml in the configuration file:

  Static {    adddeprecatedkeys ();     // adds the default resources    Configuration.adddefaultresource ("Hdfs-default.xml");    Configuration.adddefaultresource ("Hdfs-site.xml");  }

4.3 Input/output stream

The input/output stream and the file system correspond, look at the input stream first:

  

Where Hdfsdatainputstream is the implementation of Fsdatainputstream, its constructor is:

   Public throws IOException {    Super(in);  }
The Dfsinputstream hierarchy looks like this:

Look at the output stream:

  

The emphasis is on Hdfsdataoutputstream, whose constructors are:

   Public Hdfsdataoutputstream (Dfsoutputstream out, filesystem.statistics stats,      longthrows  IOException {    Super(out, stats, startposition);  }
The Dfsoutputstream hierarchy is:

Reference documents:

"1" http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

"2" http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

"3" Http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

"4" http://blog.csdn.net/gaoxingnengjisuan/article/details/11177049

HDFs Source Analysis First bomb

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.