The main purpose of the HDFs design is to store massive amounts of data, meaning that it can store a large number of files (terabytes of files can be stored). HDFs divides these files and stores them on different Datanode, and HDFs provides two access interfaces: The shell interface and the Java API interface, which operate on the files in HDFs, each of which datanode on each block. is transparent to developers.
The following describes the operation of HDFs through the shell interface, the commands for HDFs processing files are basically the same as the Linux commands, which are case-sensitive
Directory
1. Shell operation of a single HDFs cluster
2. Shell operation of multiple HDFS clusters
3. Other common Shel operations for Hadoop administrators
1. Shell operation of a single HDFs cluster
Here are a few examples of the commands in the common scenario
The file directory structure on HDFS is similar to Linux, and the root directory uses a "/" representation.
The following command establishes the directory under the/middle (already existing) directory Weibo
[Email protected] hadoop]$ Hadoop Fs-mkdir/middle/weibo
The effect is as follows:
- Uploading files Weibo.txt to the Weibo directory
[Email protected] ~]$ Hadoop fs-put weibo.txt/middle/weibo/
The effect is as follows:
You can also use the-copyfromlocal parameter.
[Email protected] ~]$ Hadoop fs-copyfromlocal weibo.txt/middle/weibo/
- View the contents of the Weibo.txt file.
[Email protected] ~]$ Hadoop fs-text/middle/weibo/weibo.txt
The effect is as follows:
You can also view the contents of a file with the-cat,-tail parameters. However, the compressed result file can only be viewed with the-text parameter, otherwise it is garbled.
[Email protected] ~]$ Hadoop fs-cat/middle/weibo/weibo.txt
[Email protected] ~]$ Hadoop fs-tail/middle/weibo/weibo.txt
- Enter content into "/middle/weibo/weibo.txt" via terminal
[Email protected] ~]$ Hadoop fs-appendtofile-/middle/weibo/weibo.txt
As shown below:
Exit terminal input, press CTRL + C
- Copy "/middle/weibo/weibo.txt" to "/middle"
[Email protected] ~]$ Hadoop fs-cp/middle/weibo/weibo.txt/middle
The effect is as follows:
- Copy the Weibo.txt file locally.
[Email protected] ~]$ Hadoop fs-get/middle/weibo/weibo.txt
The effect is as follows:
You can also use the-copytolocal parameter.
[Email protected] ~]$ Hadoop fs-copytolocal/middle/weibo/weibo.txt
- Delete the Weibo.txt file.
[Email protected] ~]$ Hadoop fs-rm/middle/weibo/weibo.txt
The effect is as follows:
- Delete the/middle/weibo folder.
[Email protected] ~]$ Hadoop fs-rm-r/middle/weibo
The effect is as follows:
- Displays the files under the/middle directory.
[Email protected] ~]$ Hadoop fs-ls/middle
The effect is as follows:
2. Shell operation of multiple HDFS clusters
The above is about accessing a single HDFs cluster, but what if multiple Hadoop clusters need to replicate data? Fortunately, Hadoop has a useful distcp distributed replication program that is implemented by the MapReduce job, which accomplishes a large amount of data replication between clusters through a map running in parallel in the cluster. Below we will describe how distcp should be used in different scenarios
- Two clusters running the same version of Hadoop
Make sure that the two cluster versions are the same, taking HADOOP1, hadoop2 clusters as an example, as shown below
1), transfer data between two HDFS clusters, by default distcp will skip files that already exist under the target path
[Email protected] ~]$ Hadoop distcp/weather hdfs://hadoop2:9000/middle
The effect is as follows:
This directive is executed in hadoop1, meaning that the/weather directory and its contents are copied to the/middle directory of the HADOOP2 cluster, so the final directory structure of the HADOOP2 cluster is/middle/weather
As shown below
If the/middle does not exist, create a new one. You can also specify multiple source paths and copy all paths to the target path.
The destination path (HADOOP2) must be an absolute path, the source path (HADOOP1) can be an absolute path, or it can be a relative path, because I am executing in hadoop1, and the default is the HDFS protocol
Error may occur when executing this instruction
As shown below
This is because HADOOP2 (hadoop2 corresponds to ip:192.168.233.130) is not appended to the/etc/hosts file, as shown below
If the instruction is executed in HADOOP2, it can be written as follows
[Email protected] ~]$ Hadoop distcp hdfs://hadoop1:9000/weather/middle
The effect is as follows:
At this point, the source path must write absolute path, the directory path can be an absolute path, or a relative path, because I was executed in HADOOP2, and the default is the HDFS protocol, if the error, please refer to the above
2), two HDFS cluster transfer data, overwriting the existing file using overwrite
[Email protected] ~]$ Hadoop distcp-overwrite/weather hdfs://hadoop2:9000/middle/weather
As shown below
Note that when overwrite, only the contents of/weather are overwritten in "Hdfs://hadoop2:9000/middle/weather", not the/weather directory itself, so when overwrite, the directory path is added/ Weather
3), two HDFS cluster transfer data, update the changed files using update.
[Email protected] ~]$ Hadoop distcp-update/weather hdfs://hadoop2:9000/middle/weather
The effect is as follows:
Note that when you update, you simply overwrite the contents of/weather in "Hdfs://hadoop2:9000/middle/weather", not the/weather directory itself, so the directory path is added/weather
- Two clusters running a different version of Hadoop
RPC for different versions of a Hadoop cluster is incompatible, and using DISTCP to replicate data and use the HDFS protocol can cause replication jobs to fail. To compensate for this situation, you can choose one of the following two ways: Hadoop1, hadoop3 Two clusters for example, the following version
1), based on hftp implementation of two HDFS cluster transfer data between
[Email protected] ~]$ Hadoop distcp hftp://hadoop1:50070/weather/middle
As shown below
There are three points to note:
1, this command must be run on the target cluster, in order to achieve the compatibility of HDFs RPC version
2, the HFTP address is determined by the Dfs.http.address property, its port default value is 50070
3, the command is to transfer the contents of Hftp://hadoop1:9000/weather to the/middle directory, does not include the/middle directory itself
2), based on WEBHDFS implementation of two HDFS cluster transfer data between
If you use the new WEBHDFS protocol (instead of HFTP), you can use the HTTP protocol to communicate with both the source and target clusters without causing any incompatibility problems
[Email protected] ~]$ Hadoop distcp webhdfs://hadoop1:50070/weather webhdfs://hadoop3:50070/middle
As shown below
3. Other common shell operations for Hadoop administrators
Mastering how the shell accesses HDFS, as a Hadoop administrator, also needs to master the following common commands
- View the job that is running.
[Email protected] ~]$ Hadoop job–list
As shown below
[Email protected] ~]$ Hadoop job-kill job_1432108212572_0001
As shown below
- Check the HDFS block status to see if it is damaged.
[[email protected] ~]$ Hadoop fsck/
- Check the HDFS block status and remove the corrupted block.
[[email protected] ~]$ Hadoop fsck/-delete
- Check the HDFS status, including DataNode information.
[Email protected] ~]$ Hadoop dfsadmin-report
[Email protected] ~]$ Hadoop Dfsadmin-safemode Enter
As shown below
[Email protected] ~]$ Hadoop dfsadmin-safemode leave
As shown below
- Balancing files in a cluster
[Email protected] ~]$/usr/java/hadoop/sbin/start-balancer.sh
The start-balancer.sh command is located under/sbin under the Hadoop installation path
As shown below
Document Link: Download
Good command of HDFs shell access