HDFs command Line interface detailed

Source: Internet
Author: User
Tags hadoop fs

Now we'll interact with HDFs through the command line. HDFs also has many other interfaces, but the command line is the simplest and most familiar to many developers.

When we set up a pseudo-distribution configuration, there are two properties that need further explanation. The first is Fs.default.name, set to hdfs://localhost/, which is used to set the default file system for Hadoop. The file system is specified by the URI, where we have used an HDFS URI to configure HDFs as the default file system for Hadoop. The HDFs daemon will use this property to determine the host and port of the HDFs name node. We will run on localhost with the default port of 8020. In this way, the HDFs user will know where the name node is running to make it easier to connect to it.

The second property, Dfs.replication, is set to 1, so that HDFS does not replicate the file system block by default by 3 copies. When running on a single data node, HDFs cannot replicate the block to 3 data nodes, so there is insufficient copy of the persistent warning block. This setting resolves this issue.

Basic File System operations

The filesystem is ready and we can perform all other file system operations, such as reading files, creating directories, moving files, deleting data, listing index directories, and so on. Enter the Hadoop fs-help command to see a detailed help file for all commands.

First copy a file from the local file system to HDFs:

1.% hadoopfs-copyfromlocal Input/docs/quangle.
TXT hdfs://localhost/user/tom/quangle.txt

The command calls the Hadoop file System shell command FS, which provides a series of subcommands. Here, we are doing the-copyfromlocal. The local file Quangle.txt is copied to the/user/tom/quangle.txt file in the HDFs entity running on localhost. In fact, we can omit the URI format from the host and choose the default setting, that is, omit Hdfs://localhost, as specified in Core-site.xml.

1.% Hadoop fs-copyfromlocal Input/docs/quangle.
Txt/user/tom/quangle.txt

You can also use a relative path and copy the file to the home directory, which is/user/tom:

1.% Hadoop fs-copyfromlocal input/docs/quangle.txt quangle.txt

We copy the files back to the local filesystem to see if they are the same:

1.% Hadoop fs-copytolocal quangle.txt quangle.copy.txt

2.% MD5 Input/docs/quangle.txt Quangle.copy.txt

3. MD5 (input/docs/quangle.txt) = A16F231DA6B05E2BA7A339320E7DACD9

4. MD5 (quangle.copy.txt) = A16F231DA6B05E2BA7A339320E7DACD9

The MD5 analysis results are the same, indicating that the file survived and was intact in the HDFs journey.

Finally, let's take a look at the list of HDFs files. Let's create a table of contents to see how it appears in the list:

1.% Hadoop Fs-mkdir Books

2.% Hadoop fs-ls.

3. Found 2 Items

4. Drwxr-xr-x-Tom SuperGroup 0
2009-04-02 22:41/user/tom/books

5.-rw-r--r--1 Tom SuperGroup 118
2009-04-02 22:29/user/tom/quangle.txt

The returned information results are very similar to the output of the Unix command ls-l, with only minor differences. The first column shows the file format. The second column is the number of copies of this file (which is not in the Unix file system). Since the default number of copies we set is 1 on the site, it is also shown here as 1. The beginning of this column is empty because the concept of a copy is not applied-the directory is the metadata and exists in the name node, not the data node. The third and fourth columns show the users and groups to which the files belong. The fifth column is the size of the file, shown in bytes, with a directory size of 0. Columns sixth and seventh are the date and time when the file was last modified. The last eighth column is the absolute path to the file or directory.

File Permissions in HDFs

HDFs has a license pattern that is very similar to POSIX for files and directories.

There are three types of licenses: Read license (R), write license (w), and execution license (x). Read licenses are required to read files or list directory contents. Writing to a file, or creating or deleting a file or directory on a directory, requires a write permission. Performing a license for a file can be ignored because the file cannot be executed in HDFs (unlike POSIX), but is required when accessing a subdirectory of a directory.

Each file and directory has a user, a group, and a schema that belongs to it. This mode is comprised of the license of the owning user, the license of the other members of the group, and the license of other users.

The client's identity is determined by the username (name) and groups (group) of the process it is running. Because the client is remote, anyone can simply create an account on the remote system to access it. Therefore, licenses can only be used by users in a cooperative community as a mechanism to share file system resources and prevent accidental loss of data, and not to protect resources in a hostile environment. However, with these drawbacks in order to prevent users or automated tools and programs from accidentally modifying or deleting important portions of the file system, it is worthwhile to use the license (which is also the default configuration, see the Dfs.permissions property).

If a license check is enabled, both the user license and the group license are checked to confirm that the user's user name is the same as the user's license and that he or she is a member of this user group, and if not, check for additional licenses.

Here is the concept of a super user, which is the identity of the name node process. The system does not perform any license checks for super users.

HDFs command Line interface detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.