Hadoop Cluster Integrated Kerberos

Source: Internet
Author: User
Tags auth zookeeper file permissions hadoop fs kinit

Last week, the team led the research to Kerberos, to be used in our large cluster, and the research task was assigned to me. This week's words were probably done with a test cluster. So far the research is still relatively rough, many online data are CDH clusters, and our cluster is not used CDH, so in the process of integrating Kerberos there are some differences.

The test environment is a cluster of 5 machines, and the Hadoop version is 2.7.2. The 5 machine hosts were
RM1, RM2, Test-nn1, Test-nn2, 10-140-60-50. Select RM1 as the KDC server.

1. Install the Software
Install KRB5, Krb5-server, and krb5-client on RM1
Command line execution: Yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation-y
4 other machines installed Krb5-level, krb5-workstation
4 machine command lines executed separately: Yum install Krb5-devel krb5-workstation-y

2. Configuration file Modification
There are 3 configuration files involved in the KDC server
/etc/krb5.conf,/var/kerberos/krb5kdc/kdc.conf,/var/kerberos/krb5kdc/kadm5.acl
The contents of the krb5.conf file are as follows:

The Ticket_lifetime and renew_lifetime are more important parameters, both parameters are time parameters, the former represents a valid time to access the voucher, the default is 24 hours, here I have made a modification, modified to 10,000 days. Because the expiration of the expiration date, and then the implementation of Hadoop fs-ls node Similar command will be invalidated, the credentials are stored in/tmp, the file format is krb5cc_xxx (XXX is the user code, that is, the user code in/etc/passwd). How to modify this is described below.
Also can be modified is [libdefaults] in the Default_realm parameter, the figure is example.com, here can be arbitrarily named string, uppercase, to. COM end. The parameters in [realms] also need to be modified, as described earlier, I chose RM1 as the KDC server.
After configuration, distribute the file to/etc for all other nodes.

The contents of the kdc.conf file are as follows:

The values for the Max_life and Max_renewlife parameters are also configured here for 10,000 days, and the others are the default values.

Kadm5.acl files can not be changed.

The JCE is installed on each node of the cluster and requires the same JCE as the current Java version. I am using the java1.7, download the following links:
Http://www.oracle.com/technetwork/java/embedded/embedded-se/downloads/jce-7-download-432124.html
After downloading, the compressed package is decompressed, and the Local_policy.jar and Us_export_policy.jar are copied to java_home/jre/lib/security.

3. Create a Database
After the software is installed, you need to run the Initialize Database command on the RM1 command line execution:
Kdb5_util create-r javachen.com-s. This is based on the corresponding value in Default-realm in krb5.conf. After the run is complete, build the principal database under/var/kerberos/krb5kdc/.

4. Start the service
Execute on the RM1 command line:
Service KRB5KDC Start
Service Kadmin Start

5. Create Principals
After you type kadmin.local on the RM1 command line, type

Addprinc-randkey hadoop/rm1@example.com (I am here in the config file is example.com)
Addprinc-randkey hadoop/rm2@example.com
Addprinc-randkey hadoop/test-nn1@example.com
addprinc-randkey hadoop/test-nn2@example.com
addprinc- Randkey hadoop/10-140-60-50@example.com
addprinc-randkey http/rm1@example.com
addprinc-randkey HTTP/ Rm2@EXAMPLE.COM
addprinc-randkey http/test-nn1@example.com
addprinc-randkey http/test-nn2@example.com
Addprinc-randkey http/10-140-60-50@example.com

Because all of the services in the cluster are started with Hadoop users, only the principals of Hadoop needs to be created. CDH clusters require HDFs, yarn, mapred3 users

6. Create a keytab file
RM1 command line execution:

Kadmin.local-q "Xst-  k hadoop.keytab  hadoop/rm1@example.com"
kadmin.local-q "Xst-  k Hadoop.keytab  hadoop/rm2@example.com "
kadmin.local-q" Xst-  k hadoop.keytab  hadoop/test-nn1@example.com "
Kadmin.local-q "Xst-  k hadoop.keytab  hadoop/test-nn2@example.com"
kadmin.local-q "Xst-  K Hadoop.keytab  hadoop/10-140-60-50@example.com "
kadmin.local-q" Xst-  k http.keytab  http/ Rm1@EXAMPLE.COM "
kadmin.local-q" Xst-  k http.keytab  http/rm2@example.com "
kadmin.local-q" xst -  k Http.keytab  Http/test-nn1@example.com "
kadmin.local-q" Xst-  k http.keytab  http/test-nn2@example.com "
Kadmin.local-q "Xst-  k http.keytab  http/10-140-60-50@example.com"

This will generate the Hadoop.keytab and Http.keytab files in the/VAR/KERBEROS/KRB5KDC directory.
Continue RM1 command line type Ktutil
Then type Rkt Hadoop.keytab Enter
Type Rkt Http.keytab Enter again
Finally type wkt Hdfs.keytab enter
This will generate the file Hdfs.keytab, with the klist command to display the list (partial content)

7. Deploying keytab Files
Distribute the Hdfs.keytab files generated on the RM1 to/etc/hadoop under each node. For security reasons, you can set the file permission to 400.

8. Stop all services in the cluster

9. Modify the relevant configuration file
A, Core-site.xml, join

<property>
      <name>hadoop.security.authentication</name>
      <value>kerberos</ value>
</property>
<property>
      <name>hadoop.security.authorization</name >
      <value>true</value>
</property>

B, Hdfs-site.xml, join

<property> <name>dfs.block.access.token.enable</name> <value>true</value> </ property> <property> <name>dfs.https.enable</name> <value>true</value> </ property> <property> <name>dfs.https.policy</name> <value>HTTPS_ONLY</value> < /property> <property> <name>dfs.namenode.https-address.pin-cluster1.testnn1</name> <value >test-nn1:50470</value> </property> <property> <name> Dfs.namenode.https-address.pin-cluster1.testnn2</name> <value>test-nn2:50470</value> </ property> <property> <name>dfs.https.port</name> <value>50470</value> </prop erty> <property> <name>dfs.namenode.keytab.file</name> <value>/usr/local/hadoop/etc/ Hadoop/hdfs.keytab</value> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>hadoop/_HOST@EXAMPLE.COM</value> </property> <property> < Name>dfs.namenode.kerberos.internal.spnego.principal</name> <value>http/_host@example.com</ value> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>700 </value> </property> <property> <name>dfs.datanode.address</name> <value>
  0.0.0.0:1004</value> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:1006</value> </property> <property> <name>dfs.datanode.keytab.file </name> <value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value> </property> <property > <name>dfs.datanode.kerberos.principal</name> <value>hadoop/_host@example.com</value > </property> <property> <property> <name>dfs.journalnoDe.keytab.file</name> <value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value> </property > <property> <name>dfs.journalnode.kerberos.principal</name> <value>hadoop/_ host@example.com</value> </property> <property> <name> Dfs.journalnode.kerberos.internal.spnego.principal</name> <value>http/_host@example.com</value > </property> <property> <name>dfs.web.authentication.kerberos.principal</name> < value>http/_host@example.com</value> </property> <property> <name> Dfs.web.authentication.kerberos.keytab</name> <value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</ Value> </property>

Note: The Dfs.https.policy item must be HTTPS instead of HTTP, otherwise the startup service will error.

C, Yarn-site.xml, join

<property> <name>yarn.resourcemanager.keytab</name> <value>/usr/local/hadoop/etc/hadoop/ hdfs.keytab</value> </property> <property> <name>yarn.resourcemanager.principal</name > <value>hadoop/_HOST@EXAMPLE.COM</value> </property> <property> <name> Yarn.nodemanager.keytab</name> <value>/usr/local/hadoop/etc/hadoop/hdfs.keytab</value> </ property> <property> <name>yarn.nodemanager.principal</name> <value>hadoop/_ host@example.com</value> </property> <property> <name> Yarn.nodemanager.container-executor.class</name> <value> Org.apache.hadoop.yarn.server.nodemanager.linuxcontainerexecutor</value> </property> <property
> <name>yarn.nodemanager.linux-container-executor.group</name> <value>hadoop</value> </property> <property> <name>yarn.https.policy&Lt;/name> <value>HTTPS_ONLY</value> </property>
 

D, Mapred-site.xml, join

<property>
  <name>mapreduce.jobhistory.keytab</name>
  <value>/usr/local/hadoop/ etc/hadoop/hdfs.keytab</value>      
</property>
<property>
  <name> mapreduce.jobhistory.principal</name>
  <value>hadoop/_HOST@EXAMPLE.COM</value>
</ property>
<property>
  <name>mapreduce.jobhistory.http.policy</name>
  <value >HTTPS_ONLY</value>
</property>

E, zookeeper configuration file zoo.cfg, join
Authprovider.1=org.apache.zookeeper.server.auth.saslauthenticationprovider
jaasloginrenew=3600000
Create a new file jaas.conf in the same directory:

Server {
    com.sun.security.auth.module.Krb5LoginModule required
    usekeytab=true
    keytab= "/usr/local/ Hadoop/etc/hadoop/hdfs.keytab "
    storekey=true
    useticketcache=true
    principal=" hadoop/ Rm1@EXAMPLE.COM ";
};

Principal to vary depending on the host.

New file Java.env file:

Export jvmflags= "-djava.security.auth.login.config=/usr/local/zookeeper/conf/jaas.conf"

F, hadoop-env.sh, join

Export Hadoop_secure_dn_user=hadoop
export hadoop_secure_dn_pid_dir=${hadoop_home}/sec_pids export
HADOOP _secure_dn_log_dir=/data/hadoop/data12/hadoop-sec-logs
Export Jsvc_home=/usr/local/jsvc

Here Jsvc need to be installed separately, the following will be introduced.

G, Container-executor.cfg, join

allowed.system.users=# #comma separated list of system users who CAN run applications yarn.nodemanager.local-dirs=/data/hadoop/data1/yarn/data,/data/hadoop/data2/yarn/data,/data/hadoop/data3/yarn/ Data,/data/hadoop/data4/yarn/data,/data/hadoop/data5/yarn/data,/data/hadoop/data6/yarn/data,/data/hadoop/data7 /yarn/data,/data/hadoop/data8/yarn/data,/data/hadoop/data9/yarn/data,/data/hadoop/data10/yarn/data,/data/
Hadoop/data11/yarn/data,/data/hadoop/data12/yarn/data Yarn.nodemanager.linux-container-executor.group=hadoop Yarn.nodemanager.log-dirs=/data/hadoop/data1/yarn/log,/data/hadoop/data2/yarn/log,/data/hadoop/data3/yarn/log, /data/hadoop/data4/yarn/log,/data/hadoop/data5/yarn/log,/data/hadoop/data6/yarn/log,/data/hadoop/data7/yarn/ log,/data/hadoop/data8/yarn/log,/data/hadoop/data9/yarn/log,/data/hadoop/data10/yarn/log,/data/hadoop/data11/ Yarn/log,/data/hadoop/data12/yarn/log #banned. Users=hadoop min.user.id=1 

Yarn-nodemanager.local-dirs and Yarn.nodemanager.log-dirs are configured in accordance with Yarn-site.xml. Min.user.id Also note that this setting is the minimum value that can be used to submit the user ID of the task, which is the user ID of the/etc/passwd. The default is 1000 if not configured, so if the user ID is less than 1000, then submit the task error. So this is set to 1. The original 1000 was designed to prevent other super users from using the cluster.

10. Compiling the source code
Because Container-executor (under Hadoop_home/bin) requires container-executor.cfg this file and all its parent directories to belong to the root user, the boot NodeManager will error. The configuration file container-executor.cfg the default path in Hadoop_home/etc/hadoop/container-executor.cfg. It is obviously not possible to modify all parent directories according to the default path as root. The path is then compiled into the/etc/container-executor.cfg.

Download HADOOP-2,7,2-SRC Source package decompression, into the SRC directory, execute
MVN package-pdist,native-dskiptests-dtar-dcontainer-executor.conf.dir=/etc
The execution process takes a long time. (The company has a dedicated compile server to use)
After, enter the Hadoop-2.7.2/src/hadoop-dist/target, the new compiled code has been generated, of course, compiling the source of the corresponding environment should be set up, here do not introduce.
Replace the newly generated container-executor with the original container-executor of all nodes, and all nodes will be hadoop_home/etc/ The Container-executor.cfg file under Hadoop is copied to/etc, and the Set permission is Root:root. Execute under the Bin folder

Strings Container-executor | grep etc

If the result is/etc rather than: /etc, it means the operation was successful. In addition, the most important thing is to put the container-executor under the bin
File permissions are set to Root:hadoop and 4750, if the permissions are not 4750, the start NodeManager will be error, error is not able to provide a reasonable container-executor.cfg file.

10. Start the service
A, zookeeper start up as normal
B, journalnode start up as normal
C, Namenode start up as normal
D, ZKFC start up as normal
E, NodeManager start up as normal
All of the above services are started with Hadoop users
F, Datanode boot to start with the root user, and need to install JSVC

JSVC installation needs to download commons-daemon-1.0.15-src.tar.gz
Enter Commons-daemon-1.0.15-src/src/native/unix after decompression
To perform the step-by:
SH support/buildconf.sh
./configure here to configure the Java_home in the/etc/profile.
Make
After 3 steps. At this time in the current directory has a file jsvc, the path is configured to the previous description of the hadoop-env.sh can be configured before you can try whether it is useful. Use the command to execute in the current directory./jsvc-help
Finally, start the datanode with the root user.
(later found that you do not have to use root to start Datanode, add in Hdfs-site.xml

<property>
<name>ignore.secure.ports.for.testing</name>
<value>true</value>
</property>
To

11. Check
After all services have been started, the Namenode page appears security is on.
Using Hadoop user execution

Kinit-k-e/usr/local/hadoop/etc/hadoop/hdfs.keytab hadoop/rm1@example.com
kinit-k-e/usr/local/hadoop/etc/ Hadoop/hdfs.keytab hadoop/rm2@example.com
kinit-k-e/usr/local/hadoop/etc/hadoop/hdfs.keytab hadoop/ Test-nn1@EXAMPLE.COM
kinit-k-e/usr/local/hadoop/etc/hadoop/hdfs.keytab hadoop/test-nn2@example.com
Kinit-k-e/usr/local/hadoop/etc/hadoop/hdfs.keytab hadoop/10-140-60-50@example.com

5 instructions are executed on 5 machines respectively. This will generate the Krb5cc_xxx file in/tmp, which has been explained by the use of the previous article.
Then the Hadoop user at any node of the implementation of Hadoop Fs-ls can be successful, or error, wrong to find the TGT. Before the ticket_lifetime changed to 10000 is also because here, after 24 hours, then the implementation of Hadoop FS will also error, because the default ticket time only 24 hours, need to be extended.

Here Ticket_lifetime and Renew_lifetime view and modify the way to do the introduction
These two times are determined by the
(1) Max_life and Max_renewable_life in/var/kerberos/krb5kdc/kdc.conf on Kerberos server
(2) When the principal is established, these two times are built in automatically, and can be viewed and modified by commands.
(3) Ticket_lifetime and Renew_lifetime in/etc/krb5.conf
(4) The time parameter followed by the KINIT-L command
The smallest of the 4 medium values is determined.
After using the Kinit-l command to add a time carriage return, the prompt needs to enter a password, tested, the password is not added principals when the password specified, so also unknown how to modify. It is recommended that you modify the Ticket_lifetime in the configuration file before adding principals to ensure time modification.

Configuration file changes, you need to restart the service after the end
Servi

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.