Multiple interfaces are available to access HDFS. The command line interface is the simplest and the most familiar method for programmers.
In this example, HDFS in pseudo sodistributed mode is used to simulate a distributed file system. For more information about how to configure the pseudo-distributed mode, see configure:
<property> <name>fs.default.name</name> <value>hdfs://localhost/</value></property>
This means that the default file system of hadoop is HDFS. At the end of this section, we will see that hadoop can also configure many other file systems.
<property> <name>hadoop.tmp.dir</name> <value>/home/norris/hadoop_tmp</value> <description>A base for other temporary directories.</description></property>
For details about this configuration, refer to "Configure. Hdfs-site.xml in:
<property> <name>dfs.replication</name> <value>1</value></property>
This indicates how many copies of blocks backups are copied. Because it is a pseudo-distributed mode, only one copy is available. The default value is 3. If this parameter is not set, the system will keep reporting that the replication fails.
After the configuration, start the HDFS deamon process: % start-dfs.sh and then use JPs to view % JPs should have namenode, secondary namenode, datanode and other processes running.
Basic file system operations are performed on HDFS and can be performed on all file systems, such as reading files, writing files, creating directories, deleting files, and listing all files in directories. The general syntax is: % hadoop FS-ls/, all of which start with hadoop FS and use-to introduce the operation to be performed, followed by specific parameters. For example, here-ls is equivalent to the LS-l command in Linux, followed by/is the root directory, which is to list all the files under the root directory. You can use % hadoop FS-help to query more operation commands. If you want to query the specific usage method of a specific operation, for example, mkdir usage: % hadoop FS-help mkdir
The following uses % hadoop FS-mkdir/user/Norris/to create a directory/user/Norris/because my current Linux User is Norris, in HDFS, the default current directory is/user/Norris/. I will discuss the File Permission issues later. If I am a Norris user in Linux, my default current directory in HDFS is/user/Norris/, but this directory does not exist. Therefore, if you do not create this directory, execute: % hadoop FS-ls is used to list the files in the current directory. An error is returned because the current directory is not found. If this directory is created, the files in it can be listed.
Run the following command to put a file from the local file system into HDFS: % hadoop FS-copyfromlocal/home/Norris/data/hadoop/weatherdata.txt/user/Norris/weatherdata.txt put the local/home/Norris/data/hadoop/weatherdata.txt file to/user of HDFS /Norris/weaterdata.txt. Note: 1. here, the second parameter/user/Norris/weatherdata.txt is the absolute path, because/user/Norris/is the current path, you can also use the relative path to directly write % hadoop FS-copyfromlocal/xxx weatherdata.txt 2. /user/Norris/weatherdata.txt is a short path, because the default file system is HDFS: // localhost/, and its full path is written as follows: HDFS: // localhost/user/Norris/weatherdata.txt
Run the copytolocal command to copy the file: % hadoop FS-copytolocal/user/Norris/weatherdata.txt/home/Norris/data/weatherdata1.txt to compare the original file with the new file, the two are exactly the same.
All directories and files in the current directory % hadoop FS-lsfound 3 itemsdrwxr-XR-X-Norris supergroup 0 afolder-RW-r -- 1 Norris supergroup 545 weatherdata.txt-RW-r -- 1 Norris supergroup 545 weatherdata1.txt the first column is type and permission, like in Linux, the first D represents a directory, and the-represents a file. The last 9 digits indicate permissions to the owner, the same group, and others respectively, rwx indicates read/write and execution respectively. The read and write permissions are the same as those in Linux. Only X is executed, and files in HDFS cannot be executed. Only directories can have the execution permission, indicates that files in this directory can be listed. The number in the second column indicates that the blocks of this file has been copied to back up a few copies, we use the pseudo distributed mode to run, the previous configuration hdfs-site.xml with copy 1 copy, so here is 1, the directory is metadata and does not exist in datanode, so there is no concept of copying several copies. The third column is the user to which the file belongs. The fourth column is the user group. We will discuss the user group in detail later. This is a super group. The Fifth Column is the file size. The unit is bytes. The directory has no size. Column 6 and Column 7 are the last modification time. Column 8 is the file or directory name.
HDFS permissions are similar to those in Linux. For more information about permissions, we will discuss the permissions in the "Security" section in the "Set cluster" chapter.
Finally, hadoop has a File System Abstraction. HDFS is only one of the implementations. That is to say, HDFS is not only used on hadoop. To illustrate how to use HDFS: // localhost/user/Norris/weatherdata.txt in other file systems: // URI scheme of HDFS. Each file system has its URI scheme. For example, hadoop uses a local file system, Uri scheme is file, and the path is written as file: /// home/Norris/data/hadoop/weatherdata.txt used to list files in the root directory of the local file system: % hadoop FS-ls file: // other file systems are implemented as follows:
Although you can access any file system when running a mapreduce program, we strongly recommend that you use a distributed file system, especially when the data is large, this is because Data Locality optimization can be used ).
Hadoop HDFS (2) HDFS command line interface