Distributed File System (HDFS) -- shell-based Distributed File System [Distributed File System] Overview
As the data volume increases, it cannot be stored within the jurisdiction of an operating system, so it is allocated to more disks managed by the operating system, but it is not convenient to manage and maintain, therefore, a system is urgently needed to manage files on multiple machines, which is a distributed file management system.
Distributed File System features:
Is a file system that allows files to be shared on multiple hosts over the network. It allows multiple users on multiple hosts to share files and buckets.
Permeability. This allows you to access files through the network. In the view of programs and users, it is like accessing a local disk.
Fault Tolerance. Even if some nodes in the system are offline, the system can continue to operate without data loss.
There are many distributed file management systems, and HDFS is only one of them. This method is applicable when multiple queries are written at a time. Concurrent writes are not supported, and small files are not suitable.
[What is HDFS? We can regard HDFS as a Windows file system. There is a set of folder directories maintained in the Windows File System. Such a complex directory hierarchy is used to store files in different folders. We often create folders, create files, move files, copy files, delete files, edit files, and search for files. HDFS is similar to the file system in wwindows. Readers can regard HDFS as a Windows file system.]
Shell operations of HDFS
Since HDFS is a distributed file system for data access, operations on HDFS are basic operations of the file system, such as file creation, modification, deletion, and modification permissions, folder creation, deletion, and renaming. The operation commands for HDFS are similar to the operations on files by llinux shell, such as LS, mkdir, and RM.
1. HDFS operations hadoop FS xxx
A) hadoop FS-ls
B) hadoop FS-LSR # recursively display the directory structure
This command option indicates recursively displaying the directory structure of the current path, followed by the HDFS path
Note:
The path in is the root directory of HDFS. The displayed content format is very similar to the content format displayed by Linux Command LS-L. The content format of each line is parsed below:
The first letter indicates the folder (if it is "D") or the file (if it is "-");
The following nine characters indicate permissions [similar to Linux];
The following number or "-" indicates the number of copies. If it is a file, a number is used to indicate the number of copies; there is no copy in the folder;
The following "root" indicates the owner;
The subsequent "supergroup" indicates the group;
The following "0" and "4" indicate the file size in bytes;
The following time indicates the modification time, in the format of year, month, and day;
The last entry indicates the file path.
Note:
If the command option is not followed by a path, the/user/<Current user> directory is accessed. For example, if you log on as a root user, the/user/root directory of HDFS will be accessed. If there is no directory/user/root, an error will be prompted indicating that the file does not exist.
A) create a blank folder in hadoop FS-mkdir/d1
B) hadoop FS-put ABC/d1
If you execute hadoop FS-put ABC/d1 again, the screen displays:
It is not overwritten by default.
C) hadoop FS-put ABC/D2
Because the D2 directory does not exist, the screen is displayed.
-RW-r -- 1 root supergroup 37667/D2
D2 is a file
D) hadoop FS-Get
Download
E) hadoop FS-put <Linux>
Hadoop FS-put install. log/D1/newnamefrominstall. Log
Hadoop FS-ls/d1
Display
Found 1 items
-RW-r -- 1 root supergroup 37667/D1/newnamefrominstall. log # upload and rename the file
F)-du: measure the file size in the directory.
Displays the file size in the specified path, in bytes.
G)-DUS collects statistics on the file size in the directory.
This command shows the total size of the file in the specified path, in bytes.
H)-count the number of statistical files (CLIPS)
Displays the number of folders, number of files, and total file size in a specified path.
I)-MV mobile
This command option indicates moving HDFS files to the specified HDFS directory. The following two paths are followed. The first one indicates the source file and the second one indicates the target directory.
J)-CP Replication
Copy the specified HDFS file to the specified HDFS directory. The following two paths are followed. The first is the copied file, and the second is the destination.
K)-RM delete a file/blank folder
The specified file or empty directory cannot be deleted.
L)-RMR recursive Deletion
Recursively Delete All subdirectories and files in a specified directory
M)-copyfromlocal copy from local
The operation is consistent with-put.
N)-movefromlocal move from local
This command moves the file from Linux to HDFS
O) Merge getmerge to local
Merge all files in the directory specified by HDFS to files in local Linux,
A)-CAT/-text to view the File Content
B)-setrep: set the number of copies.
Modify the number of copies of saved files, followed by the number of copies, followed by the file path
If two replicas are added, HDFS automatically copies the files and generates new copies.
If the last path represents a folder, you need to follow the option-R to change the copy of all files in the folder.
Hadoop FS-setrep-R 4/d1
Another option is-W, indicating that the command is exited only after the copy operation is complete.
Hadoop FS-setrep-r-W 1/d1
C)-touchz: Create a blank File
D)-Help help
Display the help information, followed by the Command Options to be queried, such:
Hadoop FS-help RM
Note: The content displayed by this command option is not completely accurate. For example, the result of querying count is not accurate, but the usage of all command options is displayed.
1. Execute hadoop [do not write anything] to check which commands supported by hadoop
2. Execute hadoop FS [do not write anything later]. You can view the commands supported by hadoop HDFS.
Complete HDFS-shell command writing
Hadoop FS-ls HDFS: // hadoop: 9000/
Same as hadoop FS-ls/
Principle:
When installing hadoop, we modified a core-site.xml file where
<Name> fs. Default. Name </Name>
<Value> HDFS: // hadoop: 9000 </value>
The attribute value of FS. Default. name indicates the HDFS path, HDFS: // hadoop: 9000.