Hadoop. Job. ugi no longer takes effect after clouder cdh3b3 starts!
After several days, I finally found the cause. In the past, the company used the original hadoop-0.20.2, using Java to set hadoop. Job. ugi for the correct hadoop users and groups can be normal access to HDFS and can be created and deleted.
After being updated to cdh3b4, I found a lot of information for no reason. Finally, please refer:
TheHadoop. Job. ugiConfiguration no longer has any effect. Instead, please useUsergroupinformation. DOAsAPI
To impersonate other users on a non-secured cluster. (As of cdh3b3)
The hadoop. Job. ugi configuration is no longer effective. Instead, useUsergroupinformation. DOAs
The cluster is not considered safe.
Incompatible changes:
- The tasktracker Configuration ParameterMapreduce. tasktracker. Local. cache. numberdirectoriesHas been renamedMapreduce. tasktracker. cache. Local. numberdirectories. (As of cdh3u0)
- The job-level configuration parametersMapred. Max. Maps. Per. Node,Mapred. Max. Reduces. Per. Node,Mapred. Running. Map. Limit, AndMapred. Running. Reduce. LimitDeployments have been removed. (As of cdh3b4)
- Cdh3 no longer contains packages for Debian Lenny, UBUNTU hard, jaunty, or karmic. Checkout these
Upgrade instructions if you are using an Ubuntu release past its end of life. if you are using a release for which cloudera's Debian or RPM packages are not available, you can always use the tarbils from the CDH
Download Page. (As of cdh3b4)
- TheHadoop. Job. ugiConfiguration no longer has any effect. Instead, please useUsergroupinformation. DOAsAPI to impersonate other users on a non-secured cluster. (As of cdh3b3)
- The unixusergroupinformation class has been removed. Please see the new methods inUsergroupinformationClass. (As of cdh3b3)
- The resolution of groups for a user is now already med on the server side. for a user's group membership to take effect, it must be visible on the namenode and jobtracker machines. (As of cdh3b3)
- TheMapred. tasktracker. procfsbasedprocesstree. sleeptime-before-sigkillConfiguration has been renamedMapred. tasktracker. Tasks. sleeptime-before-sigkill. (As of cdh3b3)
- The HDFS and mapreduce daemons no longer run as a single sharedHadoopUser. Instead, the HDFS Daemons runHDFSAnd the mapreduce Daemons runMapred. See changes
In user accounts and groups in cdh3. (as of cdh3b3)
- Due to a change in the internal compression APIs, cdh3 is incompatible with versions ofHadoop-lzoOpen Source Project prior to 0.4.9. (As of cdh3b3)
- Cdh3 changes the wire format for hadoop's RPC mechanism. Thus, you must upgrade any existing client software at the same time as the cluster is upgraded. (All versions)
- Zero values forDFS. Socket. TimeoutAndDFS. datanode. Socket. Write. TimeoutConfiguration parameters are now respected. previusly zero values for these parameters resulted in a 5 second timeout. (As of cdh3u1)
- When hadoop's Kerberos integration is enabled, it is now required that eitherKinitBe on the path for user accounts running the hadoop client, or thatHadoop. Kerberos. kinit. CommandConfiguration option be manually set to the absolute
PathKinit. (As of cdh3u1)
Hive
- The upgrade of hive from cdh2 to cdh3 requires several manual steps. please be sure to follow the upgrade guide closely. See upgrading
Hive and hue in cdh3.
Address: https://ccp.cloudera.com/display/cdhdoc/incompatible#changes: how to use usergroupinformation. DOAs?
Join oozie to access HDFS, but only Joe can access HDFS normally. This is oozie.
......
UserGroupInformation ugi = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); ugi.doAs(new PrivilegedExceptionAction<Void>() { public Void run() throws Exception { //Submit a job JobClient jc = new JobClient(conf); jc.submitJob(conf); //OR access hdfs FileSystem fs = FileSystem.get(conf); fs.mkdir(someFilePath); } }
Configure namenode and jobtracker as follows:
<property> <name>hadoop.proxyuser.oozie.groups</name> <value>group1,group2</value> <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description> </property> <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>host1,host2</value> <description>The superuser can connect only from host1 and host2 to impersonate a user</description> </property>
If no configuration is available, it will not succeed.
Caveats
The superuser must have Kerberos credentials to be able to impersonate another user. it cannot use delegation tokens for this feature. it wocould be wrong if superuser adds its own delegation token to the proxy user ugi, as it will allow the proxy user to connect
To the service with the privileges of the superuser.
However, if the superuser does want to give a delegation token to Joe, it must first impersonate Joe and get a delegation token for Joe, in the same way as the code example above, and add it to the ugi of Joe. in this way the delegation token will have
Owner as Joe.
Secure impersonation using usergroupinformation. For more information about DOAs, see http://hadoop.apache.org/common/docs/stable/Secure_Impersonation.html.
As mentioned above, Java code can access hadoop for normal operations. Kerberos authentication must be implemented and configured using usergroupinformation. DOAs
Method.
If this is not done, the application must be operated properly under the hadoop user ?!