HADOOP:HDFS Rights Management User Guide

Source: Internet
Author: User
Keywords DFS owner check if
Tags *.h file access access rights change check client configuration content
Overview

The Hadoop Distributed File system implements a permissions model for files and directories similar to the POSIX system. Each file and directory has one owner and one group. A file or directory has different permissions for its owner, for other users in the same group, and for all other users. For a file, the R permission is required when reading this file, and W permission is required when writing or appending to the file. For a directory, when you list directory content, you need to have r permission, and you need the W permission when you create or delete a child file or subdirectory, and you need x permissions when you access the child nodes of the directory. Unlike the POSIX model, files in the HDFS permission model do not have sticky,setuid or setgid bits because there is no concept of an executable file. For simplicity's sake, there is no sticky,setuid or setgid bit for the directory. In general, the permissions of a file or directory are its mode. HDFs uses the UNIX presentation and display patterns, including using octal numbers to represent permissions. When a new file or directory is created, its owner is the user of the client process, and its owning group is the parent directory group (BSD rule).

The identity of each user process that accesses the HDFs is divided into two parts, namely the user name and the list of group names. Permissions are checked every time a user process accesses a file or directory Foo,hdfs.

If the user is the owner of Foo, check the owner's access rights, or the user's access rights if the group associated with Foo appears in the list of group names;

If the permission check fails, the customer's operation fails.

User Identity

In this version of Hadoop, the client user identity is given through the host operating system. For Unix-like systems,

username equals ' whoami '; Group list equals ' bash-c groups '.

There will be additional ways to determine user identities (such as Kerberos, LDAP, etc.) in the future. It is unrealistic to expect the first approach mentioned above to prevent a user from impersonating another user. This user identification mechanism combined with the privilege model allows a collaborative community to share resources in a file system in an organized fashion.

In any case, the user identity mechanism is only an external feature of the HDFS itself. HDFs does not provide the ability to create user identities, create groups, or process user credentials.

Realization of

Understanding system

Each file or directory operation passes the full path name to name node, and each operation checks the path for permissions. The customer framework implicitly associates user identities with the connection to name node, thereby reducing the need to change existing client APIs. This is often the case when an operation on a file succeeds and the same operation fails, because some directories on the file or path are no longer present. For example, the client begins by reading a file that sends a request to name node to get the location of the first block of data in the file. But the next request to get another block of data might fail. On the other hand, deleting a file does not revoke access to the file data blocks that the client has acquired. Privilege management enables client access to a file to be retracted between requests two times. Again, the change in permissions does not undo the current client's access to file data blocks.

The Map-reduce framework assigns user identities by passing strings, without other special security considerations. The owner and group properties of a file or directory are saved as strings, not as digital IDs that are converted to users and groups as traditional UNIX.

The rights management features of this release do not require any change in the behavior of data node. There are no associated properties, such as Hadoop owners or permissions, on the blocks on data node.

file System API changes

If a permission check fails, all methods that use one path parameter may throw a Accesscontrolexception exception.

New methods:

Public Fsdataoutputstream Create (Path F, fspermission permission, boolean overwrite, int buffersize, short replication, Long blockSize, progressable progress) throws IOException, public boolean mkdirs (Path F, fspermission permission) throws IOException public void SetPermission (path p, fspermission permission) throws IOException: public void SetOwner (path p, String username, string groupname) throws IOException; public filestatus getfilestatus (Path f) throws IOException; Also returns the owner, group, and schema attributes of the path Association.

The schema of the new file or directory is constrained by the configuration parameter umask. When you use the previous create (path, ...) method (without specifying a permission parameter), the new file pattern is 666 & ^umask. When using the new Create (path, permission, ...) method (specifying the permission parameter p), the new file pattern is P & ^umask & 666. When you create a new directory using the previous mkdirs (path) method (without specifying a permission parameter), the schema for the new directory is 777 & ^umask. When the new mkdirs (path, permission) method (with the specified permission parameter p) is used to create a new directory, the schema for the new directory is P & ^umask & 777.

shell command Change

New action:

chmod [-r] Mode file ... Only the owner of the file or Superuser has permission to change the file mode. CHGRP [-r] Group file ... Users who use the CHGRP command must belong to a specific group and are the owner of the file, or the user is superuser. Chown [-R] [Owner][:[group]] file ... The owner of the file can only be changed by the superuser. ls file ... lsr file ... The output format is adjusted to display the owner, group, and mode. Super User

Superuser is the user who is running the name node process. Broadly speaking, if you start name node, you are superuser. Superuser does anything because Superuser can pass all permissions checks. There are no permanent tokens reserved who used to be superuser; When name node starts running, the process automatically determines who is now superuser. HDFs Superuser does not have to be a superuser on the name node host, nor does it require all of the cluster's superuser users to be one. Similarly, an experimenter running HDFs on a personal workstation can easily become the superuser of his deployment instance without any configuration.

In addition, an administrator can specify a specific set of users with configuration parameters, and if so, the members of the group will be superuser.

Web Server

The identity of the Web server is a configurable parameter. Name node does not have the concept of a real user, but the Web server behaves as if it had the identity of the user selected by the administrator (user name and group). Unless the selected identity is Superuser, a portion of the namespace will not be visible to the Web server.

Online Upgrade

If the cluster is started on the 0.15 version of the dataset (Fsimage), all files and directories have owner O, group G, and Mode m, where O and G are Superuser's user identities and group names, and M is a configuration parameter.

configuration Parameter Dfs.permissions = True if True, open the permission system described in the previous article. If False, the permission check is turned off, but the other behavior does not change. Changes to this configuration parameter do not change the file or directory schema, owner, and group information.

Chmod,chgrp and chown always check permissions regardless of whether the permission mode is on or off. These commands are only useful in the context of permission checking, so there is no compatibility issue. This allows an administrator to reliably set the file owner and permissions before opening a regular permission check. Dfs.web.ugi = Webuser,webgroup user name used by the Web server. If you set this parameter to the name of a superuser, all Web customers will be able to see all of the information. If this parameter is set to a user that is not used, the Web client can only access resources that are accessible to "other" rights. The extra group can be added to the back to form a comma-delimited list. Dfs.permissions.supergroup = supergroup The group name of the Superuser. Dfs.upgrade.permission = 777 Initial mode when upgraded. File will never be set to x permissions. In the configuration file, you can use decimal number 51110. Dfs.umask = 022 Umask parameter is used when creating files and directories. In the configuration file, you can use decimal number 1810.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.