Good command of HDFs shell access

Source: Internet
Author: User
Tags hadoop fs

The main purpose of the HDFs design is to store massive amounts of data, meaning that it can store a large number of files (terabytes of files can be stored). HDFs divides these files and stores them on different Datanode, and HDFs provides two access interfaces: The shell interface and the Java API interface, which operate on the files in HDFs, each of which datanode on each block. is transparent to developers.

The following describes the operation of HDFs through the shell interface, the commands for HDFs processing files are basically the same as the Linux commands, which are case-sensitive

Directory

1. Shell operation of a single HDFs cluster

2. Shell operation of multiple HDFS clusters

3. Other common Shel operations for Hadoop administrators

1. Shell operation of a single HDFs cluster

Here are a few examples of the commands in the common scenario

    • Create a folder

The file directory structure on HDFS is similar to Linux, and the root directory uses a "/" representation.

The following command establishes the directory under the/middle (already existing) directory Weibo

[Email protected] hadoop]$ Hadoop Fs-mkdir/middle/weibo

The effect is as follows:

  

    • Uploading files Weibo.txt to the Weibo directory

[Email protected] ~]$ Hadoop fs-put weibo.txt/middle/weibo/

The effect is as follows:

  

You can also use the-copyfromlocal parameter.

[Email protected] ~]$ Hadoop fs-copyfromlocal weibo.txt/middle/weibo/

    • View the contents of the Weibo.txt file.

[Email protected] ~]$ Hadoop fs-text/middle/weibo/weibo.txt

The effect is as follows:

  

You can also view the contents of a file with the-cat,-tail parameters. However, the compressed result file can only be viewed with the-text parameter, otherwise it is garbled.

[Email protected] ~]$ Hadoop fs-cat/middle/weibo/weibo.txt

[Email protected] ~]$ Hadoop fs-tail/middle/weibo/weibo.txt

    • Enter content into "/middle/weibo/weibo.txt" via terminal

[Email protected] ~]$ Hadoop fs-appendtofile-/middle/weibo/weibo.txt

As shown below:

  

Exit terminal input, press CTRL + C

    • Copy "/middle/weibo/weibo.txt" to "/middle"

[Email protected] ~]$ Hadoop fs-cp/middle/weibo/weibo.txt/middle

The effect is as follows:

  

    • Copy the Weibo.txt file locally.

[Email protected] ~]$ Hadoop fs-get/middle/weibo/weibo.txt

The effect is as follows:

  

You can also use the-copytolocal parameter.

[Email protected] ~]$ Hadoop fs-copytolocal/middle/weibo/weibo.txt

    • Delete the Weibo.txt file.

[Email protected] ~]$ Hadoop fs-rm/middle/weibo/weibo.txt

The effect is as follows:

  

    • Delete the/middle/weibo folder.

[Email protected] ~]$ Hadoop fs-rm-r/middle/weibo

The effect is as follows:

  

    • Displays the files under the/middle directory.

[Email protected] ~]$ Hadoop fs-ls/middle

The effect is as follows:

  

2. Shell operation of multiple HDFS clusters

The above is about accessing a single HDFs cluster, but what if multiple Hadoop clusters need to replicate data? Fortunately, Hadoop has a useful distcp distributed replication program that is implemented by the MapReduce job, which accomplishes a large amount of data replication between clusters through a map running in parallel in the cluster. Below we will describe how distcp should be used in different scenarios

    • Two clusters running the same version of Hadoop

Make sure that the two cluster versions are the same, taking HADOOP1, hadoop2 clusters as an example, as shown below

    

    

1), transfer data between two HDFS clusters, by default distcp will skip files that already exist under the target path

[Email protected] ~]$ Hadoop distcp/weather hdfs://hadoop2:9000/middle

The effect is as follows:

    

This directive is executed in hadoop1, meaning that the/weather directory and its contents are copied to the/middle directory of the HADOOP2 cluster, so the final directory structure of the HADOOP2 cluster is/middle/weather

As shown below

    

If the/middle does not exist, create a new one. You can also specify multiple source paths and copy all paths to the target path.

The destination path (HADOOP2) must be an absolute path, the source path (HADOOP1) can be an absolute path, or it can be a relative path, because I am executing in hadoop1, and the default is the HDFS protocol

Error may occur when executing this instruction

As shown below

    

    This is because HADOOP2 (hadoop2 corresponds to ip:192.168.233.130) is not appended to the/etc/hosts file, as shown below

    

If the instruction is executed in HADOOP2, it can be written as follows

[Email protected] ~]$ Hadoop distcp hdfs://hadoop1:9000/weather/middle

The effect is as follows:

    

At this point, the source path must write absolute path, the directory path can be an absolute path, or a relative path, because I was executed in HADOOP2, and the default is the HDFS protocol, if the error, please refer to the above

2), two HDFS cluster transfer data, overwriting the existing file using overwrite

[Email protected] ~]$ Hadoop distcp-overwrite/weather hdfs://hadoop2:9000/middle/weather

As shown below

    

Note that when overwrite, only the contents of/weather are overwritten in "Hdfs://hadoop2:9000/middle/weather", not the/weather directory itself, so when overwrite, the directory path is added/ Weather

3), two HDFS cluster transfer data, update the changed files using update.

[Email protected] ~]$ Hadoop distcp-update/weather hdfs://hadoop2:9000/middle/weather

The effect is as follows:

    

Note that when you update, you simply overwrite the contents of/weather in "Hdfs://hadoop2:9000/middle/weather", not the/weather directory itself, so the directory path is added/weather

    • Two clusters running a different version of Hadoop

RPC for different versions of a Hadoop cluster is incompatible, and using DISTCP to replicate data and use the HDFS protocol can cause replication jobs to fail. To compensate for this situation, you can choose one of the following two ways: Hadoop1, hadoop3 Two clusters for example, the following version

    

    

1), based on hftp implementation of two HDFS cluster transfer data between

[Email protected] ~]$ Hadoop distcp hftp://hadoop1:50070/weather/middle

As shown below

    

There are three points to note:

1, this command must be run on the target cluster, in order to achieve the compatibility of HDFs RPC version

2, the HFTP address is determined by the Dfs.http.address property, its port default value is 50070

3, the command is to transfer the contents of Hftp://hadoop1:9000/weather to the/middle directory, does not include the/middle directory itself

2), based on WEBHDFS implementation of two HDFS cluster transfer data between

If you use the new WEBHDFS protocol (instead of HFTP), you can use the HTTP protocol to communicate with both the source and target clusters without causing any incompatibility problems

[Email protected] ~]$ Hadoop distcp webhdfs://hadoop1:50070/weather webhdfs://hadoop3:50070/middle

As shown below

    

3. Other common shell operations for Hadoop administrators

Mastering how the shell accesses HDFS, as a Hadoop administrator, also needs to master the following common commands

    • View the job that is running.

[Email protected] ~]$ Hadoop job–list

As shown below

  

    • Close a running job

[Email protected] ~]$ Hadoop job-kill job_1432108212572_0001

As shown below

  

    • Check the HDFS block status to see if it is damaged.

[[email protected] ~]$ Hadoop fsck/

    • Check the HDFS block status and remove the corrupted block.

[[email protected] ~]$ Hadoop fsck/-delete

    • Check the HDFS status, including DataNode information.

[Email protected] ~]$ Hadoop dfsadmin-report

    • Hadoop enters Safe mode.

[Email protected] ~]$ Hadoop Dfsadmin-safemode Enter

As shown below

  

    • Hadoop leaves Safe mode.

[Email protected] ~]$ Hadoop dfsadmin-safemode leave

As shown below

  

    • Balancing files in a cluster

[Email protected] ~]$/usr/java/hadoop/sbin/start-balancer.sh

The start-balancer.sh command is located under/sbin under the Hadoop installation path

As shown below

  

Document Link: Download

Good command of HDFs shell access

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.