Common HDFS file operation commands and precautions
The HDFS file system provides a considerable number of shell operation commands, which greatly facilitates programmers and system administrators to view and modify files on HDFS. Furthermore, HDFS commands have the same name and format as Unix/Linux commands, and thus the cost of learning HDFS commands is greatly reduced.
The basic command format of HDFS is as follows:
Bin/Hadoop dfs-cmd <args>
Here, cmd is a specific command. Remember not to ignore the Short-line "-" before cmd.
1. ls
Hadoop fs-ls/
List the directories and files under the root directory of the hdfs File System
Hadoop fs-ls-R/
List all directories and files in the hdfs File System
2. put
Hadoop fs-put <local file>
The parent directory of hdfs file must exist. Otherwise, the command will not be executed.
Hadoop fs-put <local file or dir>... <Hdfs dir>
Hdfs dir must exist, otherwise the command will not be executed
Hadoop fs-put-
Read the input from the keyboard to hdfs file and press Ctrl + D to end the input. hdfs file cannot exist; otherwise, the command will not be executed.
2.1.moveFromLocal
Hadoop fs-moveFromLocal <local src>... <Hdfs dst>
Similar to put, after the command is executed, the source file local src is deleted, and the input can be read from the keyboard to hdfs file.
2.2.copyFromLocal
Hadoop fs-copyFromLocal <local src>... <Hdfs dst>
Similar to put, the input can also be read from the keyboard to hdfs file.
3. get
Hadoop fs-get
The local file name cannot be the same as the hdfs file name. Otherwise, the system will prompt that the file already exists and the file with no duplicate names will be copied to the local device.
Hadoop fs-get
When copying multiple files or directories to a local directory, the local directory must be the folder path
Note: If the user is not root, the local path must be the path in the user folder. Otherwise, a permission error occurs,
3.1.moveToLocal
This command has not been implemented in the current version
3.2.copyToLocal
Hadoop fs-copyToLocal <local src>... <Hdfs dst>
Similar to get
4. rm
Hadoop fs-rm
Hadoop fs-rm-r
Multiple files or directories can be deleted each time.
5. mkdir
Hadoop fs-mkdir
Only directories can be created at the level 1. If the parent directory does not exist, an error will be reported using this command.
Hadoop fs-mkdir-p
If the parent directory does not exist, the parent directory is created.
6. getmerge
Hadoop fs-getmerge
Sort all files in the specified directory of hdfs and merge them to the Files specified by local. If the files do not exist, they are automatically created. If the files exist, they overwrite the content.
Hadoop fs-getmerge-nl
After nl is added, a row is left blank between hdfs files merged to the local file.
7. cp
Hadoop fs-cp
The target file cannot exist. Otherwise, the command cannot be executed, which is equivalent to renaming and saving the file and the source file still exists.
Hadoop fs-cp
The target folder must exist. Otherwise, the command cannot be executed.
8. mv
Hadoop fs-mv
The target file cannot exist. Otherwise, the command cannot be executed. It is equivalent to renaming and saving the file, and the source file does not exist.
Hadoop fs-mv
When there are multiple source paths, the target path must be a directory and must exist.
Note: moving across file systems (from local to hdfs or vice versa) is not allowed.
9. count
Hadoop fs-count
Count the number of directories, the number of files, and the total size of files in the corresponding hdfs path.
Displayed as the number of directories, number of files, total file size, input path
10. du
Hadoop fs-du
Displays the size of each folder and file in the corresponding hdfs path.
Hadoop fs-du-s
Displays the size of all files in the corresponding hdfs path.
Hadoop fs-du-h
Displays the size of each folder and file in the corresponding hdfs path. The file size is displayed in a convenient reading format. For example, 64 MB is used to replace 67108864.
11. text
Hadoop fs-text
Output text files or non-text files in some formats in text format
12. setrep
Hadoop fs-setrep-R 3
Change the number of copies of a file in hdfs. In the preceding command, Number 3 indicates the number of copies set, the-R option can recursively change the number of replicas for all directories and files under a directory.
13. stat
Hdoop fs-stat [format]
Returns the status information of the corresponding path.
[Format] optional parameters include: % B (file size), % o (Block size), % n (file name), % r (number of copies ), % y (last modification date and time)
You can write hadoop fs-stat % B % o % n
14. tail
Hadoop fs-tail
1 KB data at the end of the file is displayed in the standard output.
15. archive
Hadoop archive-archiveName name. har-p
In the command, the parameter name is used to compress the file name.
Example: hadoop archive-archiveName hadoop. har-p/user 1.txt 2.txt/des
In this example, the/userdirectory file 1.txt and 2.txt in hdfs are compressed into a file named hadoop. har files are stored in the hdfs/desdirectory. If 1.txtand 2.txt are not written, all directories and files under the/user directory are compressed into a file named hadoop. har files are stored in the hdfs/des directory.
To display har content, run the following command:
Hadoop fs-ls/des/hadoop. jar
The following command can be used to show files compressed by har:
Hadoop fs-ls-R har: // des/hadoop. har
** Note: ** the har file cannot be recompressed. If you want to add a file to. har, you can only find the original file and create a new one. The data in the original files in the har file has not changed. the real role of the har file is to reduce the excessive space waste of NameNode and DataNode.
16. balancer
Hdfs balancer
If the Administrator finds that some DataNode stores too much data and some DataNode stores less data, you can use the preceding command to manually start the internal balancing process.
17. dfsadmin
Hdfs dfsadmin-help
The administrator can use dfsadmin to manage HDFS.
Hdfs dfsadmin-report
Display basic data of the file system
Hdfs dfsadmin-safemode <enter | leave | get | wait>
Enter: enter security mode; leave: Exit security mode; get: Check whether security mode is enabled;
Wait: waiting to Exit security mode
18. distcp
Used to copy data between two HDFS
Notes
Some commands (such as mkdir) require the file \ directory name as the parameter. The parameters are generally in URI format. The basic format of The args parameter is scheme: // authority/path.
Scheme refers to a specific file system. If it is a local file, scheme is file; if it is a file on HDFS, scheme is hdfs. Authority is the address and corresponding port of the machine. Of course, just as the Linux file has an absolute path and a relative path, the URI parameter can be omitted to some extent. When hdfs: // namenode: namenodeport is set, if the path parameter is/parent/child, the actual file corresponding to it is hdfs: // namenode: namenodeport/parent/child
Note that HDFS does not have the concept of the current working directory. As mentioned above, metadata of all HDFS files is stored on the NameNode node. The storage of specific files is controlled by NameNode. A file may be split into different machines, it is also possible that files in different paths are placed on the same machine to improve efficiency. Therefore, it is unrealistic to provide cd and pwd operations for HDFS.
How does Hadoop modify the size of HDFS file storage blocks?
Copy local files to HDFS
Download files from HDFS to local
Upload local files to HDFS
Common commands for HDFS basic files
Introduction to HDFS and MapReduce nodes in Hadoop
Hadoop practice Chinese version + English version + Source Code [PDF]
Hadoop: The Definitive Guide (PDF]