Hadoop authoritative Guide Learn note three

Last Update:2014-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

About HDFs

Hadoop is plainly a file cluster that provides the processing of analytics big data, most importantly the HDFS (Hadoop Distributed File System), or Hadoop distributed filesystem.

HDFs is a system that stores large files in a streaming data access mode (one-time write-once-read mode). It does not need a high-end hardware system, the general market hardware can meet the requirements.

Currently not suitable for the application of HDFs: low-latency data access, a large number of small files, many users write arbitrary modified files.

HDFs storage is in blocks, usually with a block size of 64M. The reason to be divided into such a large block, mainly to reduce the addressing time, because at present, the data transmission rate is more and more fast, for HDFS processing big data, if the frequent addressing will inevitably make the running time longer.

The HDFs cluster has two node name nodes and multiple data nodes. Where the name node acts as a manager, the data node acts as a worker. The name node is equivalent to the branch fork point on the HDFs file tree, and the data node is labeled with the stored information for all blocks. So the loss of the name node means that HDFs is paralyzed. So Hadoop offers two mechanisms to solve this problem:

One is to replicate the persisted state files that make up the file system metadata. That is, a remote NFS mount is written to the local disk as well.

The other is to set a level two name node.

HDFS provides the interaction of the command-line interface.

Hadoop is an abstract filesystem concept, and HDFs is a concrete implementation, and the Java abstract class Org.apache.hadoop.fs.FileSystem presents a file system for Hadoop, with a few specific implementations.

As shown, Hadoop provides an interface for many files, usually through URLs to determine which file system to use for interaction.

Hadoop is a Java implementation, so the Java interface is undoubtedly one of the most serious, and here are some of the specific implementations of the Java interface.

(1) Data read:

Reading data using a URL

java identifies the URL scheme for the Hadoop file system by invoking a Fsurlstreamhandlerfactory instance to invoke the Seturlstreamhandlerfactory method in the URL.

Note: This method can only be called once in a Java virtual machine, so it is usually set to static, so if the other part of the program (perhaps not the third-party part you control) sets a urlstreamhandlerfactory, It's been too long to read data from Hadoop.

Code:

Input run:

% Hadoop Urlcat hdfs://localhost/user/tom/test.txt

Results:

Hello World, Hello World

Hello World

Hello World, Hello World

Reading data using the FileSystem API

Just look at the code, watch the notes.

(2) Data write

The FileSystem class has a series of methods for creating files.

Public Fsdataoutputstream Create (PATHF) throws IOException

Creating a file with Create is available exists () to determine whether its parent directory exists.

There is also an overloaded method progressable for passing the callback interface, so that the application we write will be told the progress of the data being written to the data node.

Package org.apache.hadoop.util;

public interface progressable{

Publicvoid progress ();

}

You can also create a file using the following methods:

Public Fsdataoutputstream Append (PATHF) throws IOException

This method allows data to be appended at the end of an open file.

(3) Catalogue

FileSystem topic How to create a directory:

Public Boolean mkdirs (Path f) thorwsioexception

(4) Querying the file system

The Filestatus class encapsulates the metadata for files and directories in the file system, including file length, block size, copy, modification time, owner, and licensing information.

FileSystem's Getfilestatus () provides a way to get the state object of a file or directory.

If you are just judging whether a file exists, you can use the exists (Path f) method mentioned earlier.

Hadoop sometimes uses a wildcard when querying a batch file, so it provides a way to perform wildcard characters.

Hadoop supports the same wildcard character as the Unix bash two filesystem methods:

Public filestatus[] Globstatus (pathpathpattern) throws IOException

Public filestatus[] Globstatus (Path pathpattern,pathfileter filter) throws IOException

Wildcard characters:

(5) Delete data

The delete () method in filesystem can permanently delete the directory.

Public Boolean Delete (Path f,boolean recursive) throwsioexception

Hadoop authoritative Guide Learn note three

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop authoritative Guide Learn note three

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop authoritative Guide Learn note three

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support