Common HDFS file operation commands and precautions

Source: Internet
Author: User
Tags hadoop fs

Common HDFS file operation commands and precautions

The HDFS file system provides a considerable number of shell operation commands, which greatly facilitates programmers and system administrators to view and modify files on HDFS. Furthermore, HDFS commands have the same name and format as Unix/Linux commands, and thus the cost of learning HDFS commands is greatly reduced.

The basic command format of HDFS is as follows:

Bin/Hadoop dfs-cmd <args>

Here, cmd is a specific command. Remember not to ignore the Short-line "-" before cmd.

1. ls

Hadoop fs-ls/

List the directories and files under the root directory of the hdfs File System

Hadoop fs-ls-R/

List all directories and files in the hdfs File System

2. put

Hadoop fs-put <local file>

The parent directory of hdfs file must exist. Otherwise, the command will not be executed.

Hadoop fs-put <local file or dir>... <Hdfs dir>

Hdfs dir must exist, otherwise the command will not be executed

Hadoop fs-put-

Read the input from the keyboard to hdfs file and press Ctrl + D to end the input. hdfs file cannot exist; otherwise, the command will not be executed.

2.1.moveFromLocal

Hadoop fs-moveFromLocal <local src>... <Hdfs dst>

Similar to put, after the command is executed, the source file local src is deleted, and the input can be read from the keyboard to hdfs file.

2.2.copyFromLocal

Hadoop fs-copyFromLocal <local src>... <Hdfs dst>

Similar to put, the input can also be read from the keyboard to hdfs file.

3. get

Hadoop fs-get

The local file name cannot be the same as the hdfs file name. Otherwise, the system will prompt that the file already exists and the file with no duplicate names will be copied to the local device.

Hadoop fs-get

When copying multiple files or directories to a local directory, the local directory must be the folder path

Note: If the user is not root, the local path must be the path in the user folder. Otherwise, a permission error occurs,

3.1.moveToLocal

This command has not been implemented in the current version

3.2.copyToLocal

Hadoop fs-copyToLocal <local src>... <Hdfs dst>

Similar to get

4. rm

Hadoop fs-rm

Hadoop fs-rm-r

Multiple files or directories can be deleted each time.

5. mkdir

Hadoop fs-mkdir

Only directories can be created at the level 1. If the parent directory does not exist, an error will be reported using this command.

Hadoop fs-mkdir-p

If the parent directory does not exist, the parent directory is created.

6. getmerge

Hadoop fs-getmerge

Sort all files in the specified directory of hdfs and merge them to the Files specified by local. If the files do not exist, they are automatically created. If the files exist, they overwrite the content.

Hadoop fs-getmerge-nl

After nl is added, a row is left blank between hdfs files merged to the local file.

7. cp

Hadoop fs-cp

The target file cannot exist. Otherwise, the command cannot be executed, which is equivalent to renaming and saving the file and the source file still exists.

Hadoop fs-cp

The target folder must exist. Otherwise, the command cannot be executed.

8. mv

Hadoop fs-mv

The target file cannot exist. Otherwise, the command cannot be executed. It is equivalent to renaming and saving the file, and the source file does not exist.

Hadoop fs-mv

When there are multiple source paths, the target path must be a directory and must exist.

Note: moving across file systems (from local to hdfs or vice versa) is not allowed.

9. count

Hadoop fs-count

Count the number of directories, the number of files, and the total size of files in the corresponding hdfs path.

Displayed as the number of directories, number of files, total file size, input path

10. du

Hadoop fs-du

Displays the size of each folder and file in the corresponding hdfs path.

Hadoop fs-du-s

Displays the size of all files in the corresponding hdfs path.

Hadoop fs-du-h

Displays the size of each folder and file in the corresponding hdfs path. The file size is displayed in a convenient reading format. For example, 64 MB is used to replace 67108864.

11. text

Hadoop fs-text

Output text files or non-text files in some formats in text format

12. setrep

Hadoop fs-setrep-R 3

Change the number of copies of a file in hdfs. In the preceding command, Number 3 indicates the number of copies set, the-R option can recursively change the number of replicas for all directories and files under a directory.

13. stat

Hdoop fs-stat [format]

Returns the status information of the corresponding path.

[Format] optional parameters include: % B (file size), % o (Block size), % n (file name), % r (number of copies ), % y (last modification date and time)

You can write hadoop fs-stat % B % o % n

14. tail

Hadoop fs-tail

1 KB data at the end of the file is displayed in the standard output.

15. archive

Hadoop archive-archiveName name. har-p

In the command, the parameter name is used to compress the file name.

Example: hadoop archive-archiveName hadoop. har-p/user 1.txt 2.txt/des

In this example, the/userdirectory file 1.txt and 2.txt in hdfs are compressed into a file named hadoop. har files are stored in the hdfs/desdirectory. If 1.txtand 2.txt are not written, all directories and files under the/user directory are compressed into a file named hadoop. har files are stored in the hdfs/des directory.

To display har content, run the following command:

Hadoop fs-ls/des/hadoop. jar

The following command can be used to show files compressed by har:

Hadoop fs-ls-R har: // des/hadoop. har

** Note: ** the har file cannot be recompressed. If you want to add a file to. har, you can only find the original file and create a new one. The data in the original files in the har file has not changed. the real role of the har file is to reduce the excessive space waste of NameNode and DataNode.

16. balancer

Hdfs balancer

If the Administrator finds that some DataNode stores too much data and some DataNode stores less data, you can use the preceding command to manually start the internal balancing process.

17. dfsadmin

Hdfs dfsadmin-help

The administrator can use dfsadmin to manage HDFS.

Hdfs dfsadmin-report

Display basic data of the file system

Hdfs dfsadmin-safemode <enter | leave | get | wait>

Enter: enter security mode; leave: Exit security mode; get: Check whether security mode is enabled;

Wait: waiting to Exit security mode

18. distcp

Used to copy data between two HDFS

Notes

Some commands (such as mkdir) require the file \ directory name as the parameter. The parameters are generally in URI format. The basic format of The args parameter is scheme: // authority/path.

Scheme refers to a specific file system. If it is a local file, scheme is file; if it is a file on HDFS, scheme is hdfs. Authority is the address and corresponding port of the machine. Of course, just as the Linux file has an absolute path and a relative path, the URI parameter can be omitted to some extent. When hdfs: // namenode: namenodeport is set, if the path parameter is/parent/child, the actual file corresponding to it is hdfs: // namenode: namenodeport/parent/child

Note that HDFS does not have the concept of the current working directory. As mentioned above, metadata of all HDFS files is stored on the NameNode node. The storage of specific files is controlled by NameNode. A file may be split into different machines, it is also possible that files in different paths are placed on the same machine to improve efficiency. Therefore, it is unrealistic to provide cd and pwd operations for HDFS.

How does Hadoop modify the size of HDFS file storage blocks?

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.