Good command of HDFs shell access

Last Update:2016-04-17 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The main purpose of the HDFs design is to store massive amounts of data, meaning that it can store a large number of files (terabytes of files can be stored). HDFs divides these files and stores them on different Datanode, and HDFs provides two access interfaces: The shell interface and the Java API interface, which operate on the files in HDFs, each of which datanode on each block. is transparent to developers.

The following describes the operation of HDFs through the shell interface, the commands for HDFs processing files are basically the same as the Linux commands, which are case-sensitive

Directory

1. Shell operation of a single HDFs cluster

2. Shell operation of multiple HDFS clusters

3. Other common Shel operations for Hadoop administrators

1. Shell operation of a single HDFs cluster

Here are a few examples of the commands in the common scenario

Create a folder

The file directory structure on HDFS is similar to Linux, and the root directory uses a "/" representation.

The following command establishes the directory under the/middle (already existing) directory Weibo

[Email protected] hadoop]$ Hadoop Fs-mkdir/middle/weibo

The effect is as follows:

Uploading files Weibo.txt to the Weibo directory

[Email protected] ~]$ Hadoop fs-put weibo.txt/middle/weibo/

The effect is as follows:

You can also use the-copyfromlocal parameter.

[Email protected] ~]$ Hadoop fs-copyfromlocal weibo.txt/middle/weibo/

View the contents of the Weibo.txt file.

[Email protected] ~]$ Hadoop fs-text/middle/weibo/weibo.txt

The effect is as follows:

You can also view the contents of a file with the-cat,-tail parameters. However, the compressed result file can only be viewed with the-text parameter, otherwise it is garbled.

[Email protected] ~]$ Hadoop fs-cat/middle/weibo/weibo.txt

[Email protected] ~]$ Hadoop fs-tail/middle/weibo/weibo.txt

Enter content into "/middle/weibo/weibo.txt" via terminal

[Email protected] ~]$ Hadoop fs-appendtofile-/middle/weibo/weibo.txt

As shown below:

Exit terminal input, press CTRL + C

Copy "/middle/weibo/weibo.txt" to "/middle"

[Email protected] ~]$ Hadoop fs-cp/middle/weibo/weibo.txt/middle

The effect is as follows:

Copy the Weibo.txt file locally.

[Email protected] ~]$ Hadoop fs-get/middle/weibo/weibo.txt

The effect is as follows:

You can also use the-copytolocal parameter.

[Email protected] ~]$ Hadoop fs-copytolocal/middle/weibo/weibo.txt

Delete the Weibo.txt file.

[Email protected] ~]$ Hadoop fs-rm/middle/weibo/weibo.txt

The effect is as follows:

Delete the/middle/weibo folder.

[Email protected] ~]$ Hadoop fs-rm-r/middle/weibo

The effect is as follows:

Displays the files under the/middle directory.

[Email protected] ~]$ Hadoop fs-ls/middle

The effect is as follows:

2. Shell operation of multiple HDFS clusters

The above is about accessing a single HDFs cluster, but what if multiple Hadoop clusters need to replicate data? Fortunately, Hadoop has a useful distcp distributed replication program that is implemented by the MapReduce job, which accomplishes a large amount of data replication between clusters through a map running in parallel in the cluster. Below we will describe how distcp should be used in different scenarios

Two clusters running the same version of Hadoop

Make sure that the two cluster versions are the same, taking HADOOP1, hadoop2 clusters as an example, as shown below

1), transfer data between two HDFS clusters, by default distcp will skip files that already exist under the target path

[Email protected] ~]$ Hadoop distcp/weather hdfs://hadoop2:9000/middle

The effect is as follows:

This directive is executed in hadoop1, meaning that the/weather directory and its contents are copied to the/middle directory of the HADOOP2 cluster, so the final directory structure of the HADOOP2 cluster is/middle/weather

As shown below

If the/middle does not exist, create a new one. You can also specify multiple source paths and copy all paths to the target path.

The destination path (HADOOP2) must be an absolute path, the source path (HADOOP1) can be an absolute path, or it can be a relative path, because I am executing in hadoop1, and the default is the HDFS protocol

Error may occur when executing this instruction

As shown below

　　　　This is because HADOOP2 (hadoop2 corresponds to ip:192.168.233.130) is not appended to the/etc/hosts file, as shown below

If the instruction is executed in HADOOP2, it can be written as follows

[Email protected] ~]$ Hadoop distcp hdfs://hadoop1:9000/weather/middle

The effect is as follows:

At this point, the source path must write absolute path, the directory path can be an absolute path, or a relative path, because I was executed in HADOOP2, and the default is the HDFS protocol, if the error, please refer to the above

2), two HDFS cluster transfer data, overwriting the existing file using overwrite

[Email protected] ~]$ Hadoop distcp-overwrite/weather hdfs://hadoop2:9000/middle/weather

As shown below

Note that when overwrite, only the contents of/weather are overwritten in "Hdfs://hadoop2:9000/middle/weather", not the/weather directory itself, so when overwrite, the directory path is added/ Weather

3), two HDFS cluster transfer data, update the changed files using update.

[Email protected] ~]$ Hadoop distcp-update/weather hdfs://hadoop2:9000/middle/weather

The effect is as follows:

Note that when you update, you simply overwrite the contents of/weather in "Hdfs://hadoop2:9000/middle/weather", not the/weather directory itself, so the directory path is added/weather

Two clusters running a different version of Hadoop

RPC for different versions of a Hadoop cluster is incompatible, and using DISTCP to replicate data and use the HDFS protocol can cause replication jobs to fail. To compensate for this situation, you can choose one of the following two ways: Hadoop1, hadoop3 Two clusters for example, the following version

1), based on hftp implementation of two HDFS cluster transfer data between

[Email protected] ~]$ Hadoop distcp hftp://hadoop1:50070/weather/middle

As shown below

There are three points to note:

1, this command must be run on the target cluster, in order to achieve the compatibility of HDFs RPC version

2, the HFTP address is determined by the Dfs.http.address property, its port default value is 50070

3, the command is to transfer the contents of Hftp://hadoop1:9000/weather to the/middle directory, does not include the/middle directory itself

2), based on WEBHDFS implementation of two HDFS cluster transfer data between

If you use the new WEBHDFS protocol (instead of HFTP), you can use the HTTP protocol to communicate with both the source and target clusters without causing any incompatibility problems

[Email protected] ~]$ Hadoop distcp webhdfs://hadoop1:50070/weather webhdfs://hadoop3:50070/middle

As shown below

3. Other common shell operations for Hadoop administrators

Mastering how the shell accesses HDFS, as a Hadoop administrator, also needs to master the following common commands

View the job that is running.

[Email protected] ~]$ Hadoop job–list

As shown below

Close a running job

[Email protected] ~]$ Hadoop job-kill job_1432108212572_0001

As shown below

Check the HDFS block status to see if it is damaged.

[[email protected] ~]$ Hadoop fsck/

Check the HDFS block status and remove the corrupted block.

[[email protected] ~]$ Hadoop fsck/-delete

Check the HDFS status, including DataNode information.

[Email protected] ~]$ Hadoop dfsadmin-report

Hadoop enters Safe mode.

[Email protected] ~]$ Hadoop Dfsadmin-safemode Enter

As shown below

Hadoop leaves Safe mode.

[Email protected] ~]$ Hadoop dfsadmin-safemode leave

As shown below

Balancing files in a cluster

[Email protected] ~]$/usr/java/hadoop/sbin/start-balancer.sh

The start-balancer.sh command is located under/sbin under the Hadoop installation path

As shown below

Document Link: Download

Good command of HDFs shell access

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More