Good command of HDFs shell access and JAVAAPI access

Last Update:2016-03-22 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The main purpose of HDFS design is to handle massive amounts of data, which means that it can store a large number of files (terabytes of files). HDFs divides these files and stores them on different Datanode, and HDFs provides two kinds of data access interfaces: The shell interface and the Javaapi interface, to manipulate the files inside HDFs.

Shell interface

The commands for HDFs processing files are basically the same as the Linux commands, which are case-sensitive. The following describes the commands for the HDFs operation of the Distributed File system.

HDFS basic Commands

Hadoop fs-cmd where CMD: specific operations, basically the same as UNIX commands

Args: Parameters

HDFs Resource URI format

Scheme://authority/path where scheme: protocol name, file or HDFs

Authority:namenode Host Name

Path: Example: Hdfs://qq58053439:9000/middle/test.txt

Assuming you have configured fs.default.name=hdfs://xuweiwei:9000 in Core-site.xml, you can use only/middle/test.txt.

Shell operation of a single HDFs cluster

1. Create a folder

The file directory structure on HDFS is similar to Linux, and the root directory uses a "/" representation. The following command will create the directory under the/middle directory Weibo

[[email protected] Hadoop] $hadoop Fs-mkdir/middle/weibo

2. Upload to Weibo.txt to Weibo directory

[[email protected] Hadoop] $hadoop fs-put Weibo.txt/middle/weibo

You can also use the-copyfromlocal parameter

[[email protected] Hadoop] $hadoop fs-copyfromlocal Weibo.txt/middle/weibo

3. View Weibo.txt File contents

[[email protected] Hadoop] $hadoop fs-text/middle/weibo/weibo.txt

You can also view the contents of a file with the-cat,-tail parameters. However, the compressed result file can only be viewed with the-text parameter, otherwise it is garbled.

[[email protected] Hadoop] $hadoop fs-cat/middle/weibo/weibo.txt

[[email protected] Hadoop] $hadoop fs-tail/middle/weibo/weibo.txt

4. Copy the Weibo.txt file to the local

[[email protected] Hadoop] $hadoop fs-get/middle/weibo/weibo.txt

You can also use the-copytolocal parameter

[[email protected] Hadoop] $hadoop fs-copytolocal/middle/weibo/weibo.txt

5. Delete weibo.txt files

[[email protected] Hadoop] $hadoop fs-rm/middle/weibo/weibo.txt

Delete/middle/weibo folder

[[email protected] Hadoop] $hadoop Fs-rmr/middle/weibo

6. Display the files in the/middle directory

[[email protected] Hadoop] $hadoop fs-ls/middle

Two clusters running the same version of Hadoop

Transfer of data between two HDFs clusters, by default distcp skips files that already exist under the target path

[[email protected] Hadoop] $hadoop distcp hdfs://qq58053439:9000/weather hdfs://qq:9000/middle

This instruction copies the first cluster/weather directory and contents to the/middle directory of the second cluster, so the last directory structure of the second cluster is/middle/weather. If the/middle does not exist, create a new one. You can also specify multiple source paths and copy all paths to the target path. The source path here must be an absolute path.

Transfer data between two HDFs clusters, overwriting existing files using overwrite.

[[email protected] Hadoop] $hadoop distcp-overwrite hdfs://qq58053439:9000 hdfs://xu:9000/middle

Transfer data between two HDFS clusters, update modified files using update

[[email protected] Hadoop] $hadoop distcp update Hdfs://qq58053439:9000/weather Hdfs://xu:58053439/middle

Two different versions of Hadoop in a cluster

RPC for different versions of a Hadoop cluster is incompatible, and using DISTCP to replicate data and use the HDFS protocol can cause replication jobs to fail. To compensate for this situation, you can use the Hftp file system based on the read-only HTTP protocol and read the data from the source file system. This job must be run on the target cluster to achieve compatibility with the HDFs RPC version.

Also take data transfer between two HDFs clusters as an example

[Email protected] hadoop]$ Hadoop distcp hftp://xu:9000/weather hdfs://xu2:9000/middle

Note that you need to specify the Namenode Web port in the URI source. This is determined by the Dfs.http.address property, which has a default value of 50070. If you use the new WEBHDFS protocol (instead of HFTP), you can use the HTTP protocol to communicate with both the source and target clusters without causing any incompatibility problems.

[Email protected] hadoop]$ Hadoop distcp webhdfs://xu:9000/weather webhdfs://xu2:9000/middle

Other common shell operations for HADOOP administrators

1. View the Job that is running. [Email protected]]$ Hadoop job-list

2. Close the running Job. [Email protected] hadoop]$ Hadoop job-kill job_1432108212572_0001

3. Check the HDFS block status to see if it is damaged [[email protected] hadoop-2.2.0-x64]$ Hadoop fsck/

4, check HDFs fast status, and delete the damaged block [[email protected] hadoop-2.2.0-x64]$ Hadoop fsck/-delete

5. Check HDFs status, including datanode information [[email protected] hadoop-2.2.0-x64]$ Hadoop dfsadmin-report

6, Hadoop into safe mode. [Email protected] hadoop-2.2.0-x64]$ Hadoop Dfsadmin-safemode Enter

7. Hadoop is out of safe mode. [Email protected] hadoop-2.2.0-x64]$ Hadoop dfsadmin-safemode leave

8. Balance the files in the cluster. [Email protected] hadoop-2.2.0-x64]$ sbin/start-balancer.sh

Java API Interface

HDFS provides the JAVAAPI interface to operate on HDFs. If the following programs are running on a Hadoop cluster, path can write relative paths, such as "/middle/weibo", and if the following programs are tested on local eclipse, the paths must be written as absolute paths, such as "hdfs://xu:9000/ Middle/weibo ".

1. Obtaining the HDFS file system

Get File system
public static FileSystem Getfilesystem () throws IOException {

Reading configuration Files
Configuration conf = new configuration ();

Returns the default file system if running under a Hadoop cluster, use this method to get the default file system directly
FileSystem fs = Filesystem.get (conf);

The specified file system address
Uri uri = new Uri ("hdfs://cloud004:9000");
Returns the specified file system if you are testing locally, you need to use this method to get the file system
FileSystem fs = Filesystem.get (uri,conf);

return FS;
}

2. Create a file directory

Create a file directory
public static void MkDir () throws Exception {

Get File system
FileSystem fs = Getfilesystem ();

Create a file directory
Fs.mkdirs (New Path ("Hdfs://cloud004:9000/middle/weibo"));

Freeing resources
Fs.close ();
}

3. Delete files or file directories

Delete file or file directory
public static void RmDir () throws Exception {

Returns the FileSystem object
FileSystem fs = Getfilesystem ();

Delete file or file directory Delete (Path f) This method has been deprecated
Fs.delete (New Path ("Hdfs://cloud004:9000/middle/weibo"), true);

Freeing resources
Fs.close ();
}

4. Get all files under the directory

Get all files under the directory
public static void Listallfile () throws ioexception{

Returns the FileSystem object
FileSystem fs = Getfilesystem ();

Listing Directory Contents
filestatus[] Status = Fs.liststatus (New Path ("hdfs://cloud004:9000/middle/weibo/"));

Get all file paths under directory
path[] Listedpaths = fileutil.stat2paths (status);

Iterate through each file
for (Path p:listedpaths) {

SYSTEM.OUT.PRINTLN (P);

}
Freeing resources
Fs.close ();
}

5. Uploading files to HDFS

File Upload to HDFS
public static void Copytohdfs () throws ioexception{

Returns the FileSystem object
FileSystem fs = Getfilesystem ();

The source file path is the path under Linux, and if tested under Windows, it needs to be rewritten as a path under Windows, such as D://hadoop/djt/weibo.txt
Path Srcpath = new Path ("/home/hadoop/djt/weibo.txt");

Destination Path
Path Dstpath = new Path ("Hdfs://cloud004:9000/middle/weibo");

Implementing file Uploads
Fs.copyfromlocalfile (Srcpath, Dstpath);

Freeing resources
Fs.close ();
}

6. download files from HDFS

Download files from HDFS
public static void GetFile () throws ioexception{

Returns the FileSystem object
FileSystem fs = Getfilesystem ();

Source file path
Path Srcpath = new Path ("Hdfs://cloud004:9000/middle/weibo/weibo.txt");

The destination path is the path under Linux, and if tested under Windows, it needs to be rewritten as a path under Windows, such as d://hadoop/djt/
Path Dstpath = new Path ("/home/hadoop/djt/");

Download the file on HDFs
Fs.copytolocalfile (Srcpath, Dstpath);

Freeing resources
Fs.close ();
}
7. Get the HDFS cluster node information

Get HDFS cluster node information
public static void Gethdfsnodes () throws ioexception{

Returns the FileSystem object
FileSystem fs = Getfilesystem ();

Get Distributed File System
Distributedfilesystem HDFs = (distributedfilesystem) fs;

Get all nodes
datanodeinfo[] Datanodestats = Hdfs.getdatanodestats ();
Cycle through all nodes
for (int i=0;i< datanodestats.length;i++) {
System.out.println ("Datanode_" +i+ "_name:" +datanodestats[i].gethostname ());
}

8. Find the location of a file in the HDFS cluster

/Find the location of a file in the HDFS cluster
public static void Getfilelocal () throws ioexception{

Returns the FileSystem object
FileSystem fs = Getfilesystem ();

File path
Path PATH = new Path ("Hdfs://cloud004:9000/middle/weibo/weibo.txt");

Get file directory
Filestatus filestatus = fs.getfilestatus (path);
Get a list of File block locations
blocklocation[] blklocations = fs.getfileblocklocations (filestatus, 0, Filestatus.getlen ());
Loop output block Information
for (int i=0;i< blklocations.length;i++) {
String[] hosts = blklocations[i].gethosts ();
System.out.println ("Block_" +i+ "_location:" +hosts[0]);
}

Run the program
The above Java API operation of the various methods of HDFS, after the local test is completed, according to their own situation, a little modification of the path path can be directly into the HADOOP environment to run. The following steps are generally required:

First step: We use Eclipse to package the Test.java class as Test.jar. Because there are no third-party jar packages involved, and the Hadoop cluster already has the jar packages needed for Hadoop, we only need to pack test.java.

Test.jar
Second step: Under the root user, through the Xshell client, upload Test.jar to the Hadoop server local directory/home/hadoop/djt/below.

[[email protected] djt]$ ls
Test.jar
Step three: You need to assign our uploaded Test.jar file permissions to the Hadoop user group under the root user.

[Email protected] ~]# chown-r Hadoop:hadoop/home/hadoop/djt/test.jar
Fourth step: We need to switch to the Hadoop installation directory (/usr/java/hadoop-2.2.0-x64) to run the Test.jar file, otherwise we will not be able to find the required jar package to execute the program.

[Email protected] hadoop-2.2.0-x64]$ Hadoop Jar/home/hadoop/djt/test.jar com.dajiangtai.hadoop.middle.Test

Good command of HDFs shell access and JAVAAPI access

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More