Large Data Hadoop platform: Hadoop multi-user management

Source: Internet
Author: User
Keywords Nbsp; dfs execute multi-user can

& ">nbsp; There is a recent need to implement multiuser management in the Hadoop cluster, so a lot of information is being searched online. There is a way to feel still more feasible, link: approximate way is: first create a user test1, and then put the Hadoop installation The directory copies a copy to this user test1 directory, then assigns the permission, then this user can submit the program to the cluster. Later, after a number of attempts, found that can indeed achieve multi-user management, and test1 users in the HDFS can only operate on their own directory, but after all, seniority is too shallow, can only vaguely feel this method can be used, unknown reasons. As a person with mild obsessive-compulsive disorder expecting to understand the principle and then use it, it is really not desirable. I'm also curious about how the company uses Hadoop multiuser management, there will certainly be a better solution, but they are not clear, so the existing Linux and Hadoop some knowledge to try to practice a set of solutions, the whole process is their own touch cable, all kinds of try out, It may not be particularly good thinking. If there is a better way, please advise.


Existing Hadoop cluster, installation directory in/home/hadoop/hadoop-2.5.1,hadoop's super username is Hadoop, user groups are also Hadoop needs to be able to add new users such as: TEST1,TEST2,TEST3 ........ And the new user's operation can be relatively simple later. These test users are able to submit Hadoop jobs and can file operations in a specified location in HDFs, they can see files in the local Hadoop installation directory, but can only see and execute, do not write and delete operations, can not start and shutdown the cluster, can only run programs.

ii. Create user

Create a new user and join the Hadoop user group.

View Sourceprint? 1.useradd-g Hadoop test1

Or add an existing user to the Hadoop user group.

View Sourceprint? 1.usermod-a-G groups test1

three, local directory assignment permissions

The idea is to assign specific permissions to the Hadoop user group so that users in the user group can perform all of the following actions, so that when new users are added, they have the appropriate permissions to join the Hadoop user group. To achieve this, you need to use ACL Rights Management mechanism, before this need to perform: (then calm down to think, when the situation is simple without ACL on the line, directly using chmod to assign permissions on it)

View Sourceprint? 1.mount-o Remount,acl/

Performing the mount at this point will see that there is an ACL on the back of the directory, so you can use ACLs to manage permissions.

Set the installation directory for Hadoop (for example:/home/hadoop) permissions to KY "" target= "_blank" class= "Keylink" >qqzrly208o7p7/ Jtshqtna00ncjrmbky/vtw7un1rve3lbbus3wtndqo6y8tmrkyovi58/c1rjb7qo6cjxwcmugy2xhc3m9 "Brush:java;" >setfacl-r-M G:hadoop:rx Hadoop If this is the case, when the test user submits the job, the local TMP file is prompted for no write permission, so it is now necessary to give the SET TMP file permission to write to the user group Hadoop:

View Sourceprint? 1.setfacl-r-M G:hadoop:rwx/home/hadoop/hadoop-2.5.1/tmp

Four, HDFs Rights Management

After performing the above steps, you will find that the Test1 user already has permission to execute the Hadoop program, but you cannot shut down the cluster and delete the file. It's time to consider security in HDFs, and since it's the test user, I don't want the test user to have too many permissions, especially for some special files (such as/NLSDE), so do the following:

View Sourceprint? 1.hdfs DFS-chmod 755/

However, because the client writes data to TMP when executing the program, all the read/tmp in HDFS needs to be assigned to the entire user group, and therefore needs to be performed:

View Sourceprint? 1.hdfs DFS-chmod 777/tmp

Then for confidential folders, such as/NLSDE, I then execute:

View Sourceprint? 1.hdfs DFS-chmod 700/nlsde

In this way, no one can access the files except for Superuser.

Then, I need to have a folder test, all Test1,test2, and so on users can access it, and then each user's work environment is also in this directory:

View Sourceprint? 1.hdfs DFS-chmod 777/test

At this point, the basic environment has become, but each user should have their own private space, test1 need their own content to not let others see, so we need to give test1 a private folder.

View Sourceprint? 1.hdfs dfs-mkdir/test/test1 2.hdfs dfs-chown test1/test/test1

However, this is not enough, other users can access this folder, now need to KY "" target= "_blank" class= "Keylink" >qq9/ Nk7sr3j6nbdo6yyu8jddgvzddlv4tcp08o7p8c0t8poytxiupboxlz+vncho8u909cyxcrhzfw1wkohcjxwcmugy2xhc3m9 "Brush:java;" >hdfs Dfs-chmod 700/test/test1 Now look at the file information, other users can no longer access.

v. Add new users

Now add new user tests to make scripting easier:

View Sourceprint?1.sudo useradd-g hadoop test22.sudo passwd test23.hdfs dfs-mkdir/test/test24.hdfs dfs-chown test2/ Test/test25.hdfs Dfs-chmod 700/test/test2

      Do some testing and use Test2 to see if you can access the Test1 folder

      Ok, So far, the multi-user management of Hadoop has been able to meet some of my needs, but there should be a lot of details in it that need to be found in practice. What I am eager to know now is how the actual operation is managed by the Hadoop multiuser management, is this the way I am, or is there a more advanced approach? Through this attempt I found that there are some problems I will still have some paranoid, as in the solution of multi-user, in order to achieve a goal, in the use of various permissions of the operation, this kind of thinking is difficult to clarify, can put this set of flow of ideas down the way it takes a lot of time. This blog gave me a lot of inspiration: again to see the website of Hadoop, found a similar approach, it seems the basic idea is right: http://

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.