Hadoop HDP Cluster Kerberos Authentication Implementation

Source: Internet
Author: User
Keywords hdp hadoop hdp hadoop distrbution
Tags apache hadoop hdp hdp hadoop mapr hadoop hadoop distribution
Hadoop (HDP) cluster kerberos authentication implementation

  • For security reasons, this article hides some system names and service names, and modifies some of the parts that may cause information leakage. Partially configured as a copy of the virtual machine configuration file, the relevant host name is for reference only.
  • This article applies to the kerberos authentication operation of the ambari platform deployed by HDP. Other environments are for reference.

The related images in the document are not uploaded.
Deployment environment: Hadoop 2.7.3 HDP2.5.3
This document refers to the official kerberos documentation, Hadoop official documentation and HDP official documentation. The relevant links are as follows:

  1. http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SecureMode.html
  2. http://web.mit.edu/kerberos/krb5-current/doc/
  3. http://docs.hortonworks.com/index.html
If you have any mistakes, please do not hesitate to advise. Thank you! 



Hadoop (HDP) cluster kerberos authentication implementation

1. Version selection
The current version of MIT is krb5-1.14.4
The system comes with version 1.10:
Choose your own version.
2. Software installation
System environment:
You need to complete the IP and hostname configuration. Pair the IP with the name in the hosts file. Generally, the Hadoop cluster has completed these steps. Add the IP and hostname of the authentication server to the hosts file of the Hadoop cluster. You need to copy the corresponding configuration on the authentication server to ensure that the direct ping host name can be exchanged without DNS resolution.

The KDC server needs to have the following files installed:
Krb5-server-1.10.3-15.1.x86_64.rpm
Krb5-devel-1.10.3-15.1.x86_64.rpm
Krb5-libs-1.10.3-15.1.x86_64.rpm
Krb5-auth-dialog-0.13-3. x86_64.rpm

Client installation:
Krb5-workstation-1.10.3-15. 1.x86_64.rpm
Krb5-libs-1.10.3-15.1.x86_64.rpm
Krb5-auth-dialog-0.13-3.x86_64.rpm

3. System planning
Realm name, port, whether to use the slave server, etc.
There are three roles in Kerberos: client, application server, and authentication server. The application server is a host used to provide network services to customers. The client is a normal host and the authentication server is a KDC of kerberos.
The KDC is installed above the CLUSTER02. The other 93 servers serve as application servers and client roles based on the services and features they provide. Most servers have two roles at the same time.
Set the realm name to REALM.COM and use the kerberos default port to avoid conflicts with other services.
KDC is the core server of the kerberos service. The security and stability of the server is the basis for the security and stability of the entire service. It is recommended to use a separate machine, do not install any other services, only start the kerberos service.
However, there is no separate host in this environment, and the KDC is tentatively installed using the snamenode server.

4. Environmental preparation
Ambari and related components have been installed.
Install JCE and deploy the JCE policy on all servers.
Prepare the jce_policy-8.zip file and copy the files to the /var/lib/ambari-server/resources directory.
Unzip -o -j -q jce_policy-8.zip -d $jdkdir/jre/lib/security/
Note that the version of jce should correspond, after deployment, ambari-server restarts the ambari service.

Set up Hadoop users:
Mp the principal with the username. (optional)
In the core-site.xml file

increase:
ULE:[1:$1@$0](ambari-qa-cluster@REALM.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-cluster@REALM.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-cluster@REALM.COM)s/.*/spark/
RULE:[1:$1@$0](.*@REALM.COM)s/@.*//
RULE:[2:$1@$0](amshbase@REALM.COM)s/.*/ams/
RULE:[2:$1@$0](amszk@REALM.COM)s/.*/ams/
RULE:[2:$1@$0](dn@REALM.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@REALM.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@REALM.COM)s/.*/mapred/
RULE:[2:$1@$0](nm@REALM.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@REALM.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@REALM.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@REALM.COM)s/.*/yarn/
DEFAULT

###If using wildcards as follows:
RULE:[1:$1@$0](.*@REALM\.COM)s/@.*//
RULE:[2:$1@$0](.*@REALM\.COM)s/@.*//

DEFAULT

5. KDC configuration

Set the global environment variable (the last line needs to be configured on all machines):
Add the global environment variable of the configuration file at the end of /etc/profile:

#Export KRB5_CONFIG=/etc/krb5.conf

#Export KRBT_KDC_PROFILE=/var/kerberos/krb5kdc/kdc.conf

#export KRB5CCNAME=/hadoop/hadoop/krb5cc_pub_$$

Export KRB5CCNAME=/hadoop/ker/krb5cc_$UID

The last action sets the save directory of the ticket. If it is not set, the ticket is saved in the /tmp directory, and there is a hidden danger.

Execute after saving
Source /etc/profile takes effect.

Configuration file:
There are three Kdc configuration files: krb5.conf kdc.conf kadm5.acl


6. About time and tgt directory issues
Ticket cache directory:
Changed from /tmp to /Hadoop/krb directory:

Edit /etc/profile
Export KRB5_CONFIG=/yourdir/krb5.conf (optional)
Export KRB5_KDC_PROFILE=/yourdir/kdc.conf (optional)
Export KRB5CCNAME=/Hadoop/hadoop/krb5cc_$UID

Ticket time problem:

4 should pay attention to:
Time settings in Krb5.conf
Time settings in Kdc.conf
Time setting for krbtgt/REALM.COM@REALM.COM
Kadmin.local: addprinc admin/admin@REALM.COM
Modprinc -maxrenewlife 300days K/M
Modprinc -maxlife 300days K/M
Modprinc -maxrenewlife 300days krbtgt/REALM.COM
Modprinc -maxlife 300days krbtgt/REALM.COM
Kinit admin/admin@REALM.COM


7. Testing
On the client kinit admin/admin
Add a test principal test
Kinit test
Klist can view the ticket of test


8. Application server side configuration
Ensure that all hosts files and users of the cluster are consistent. At the same time, the time is the same as KDC.
Copy the KDC's /etc/krb5.conf file to each machine in the cluster.

9. Client configuration
Ensure that all hosts files and users of the cluster are consistent. At the same time, the time is the same as KDC.
Copy the KDC's /etc/krb5.conf file to each machine in the cluster.
The machine enters kinit admin/admin to complete the initialization.
Then enter kadmin and enter the password to enter the remote management mode.
Enter klist. View cached ticket information.

10. Hadoop (HDP) cluster configuration
User principal name and keytab definition in HDP:
With the already installed MIT kerberos system, the system configuration is saved.
Confirm admin user and permissions with kdc
Confirm that the JCE policy has been deployed.
Confirm that all clients can connect directly and time synchronization.
Deploy kerberos using the ambari platform, which generates the principal and keytab of different components during the deployment process.

The installation process is based on the instructions of the ambari page. Note that you need to fill in the relevant user name and password of the installed KDC, REALM, etc. during the implementation.

Additional time parameters need to be modified.

HDP deletes the log of yarn when it is deployed in ambari.
After the deployment is complete, you need to check the configuration of krb5. and the time parameters of krbtgt in the KDC configuration:

Go to kadmin.local :getprinc krbtgt
If both Lifetime and renewtime have not changed, each host is initialized. If the time changes, the time is reconfigured to 300d and then initialized.


11. User instructions after installation:
Execute with a user with access to the corresponding keytab:
For example, on a datanode, to initialize, use a user, have read access to the datanode keytab, generally hdfs users, use hdfs users, enter in the terminal:
Kinit -kt /etc/security/keytabs/dn.service.keytab dn/hostname
Klist
View the information of the ticket.
If it is a hive user, it has read permission for the hive corresponding service and the client's keytab. When using the hive user, enter it in the terminal:
Kinit -kt /etc/security/keytabs/ spnego.service.keytab HTTP/hostname
See the corresponding table in the previous 10 for the related user and the principal and keytab.
Or correspond to the table below.
Once the ticket is obtained, the business operation can be performed. Note that some scripts need to be modified to run on the server where the authentication is deployed.
note:
Kerberos does not recommend using multiple users, but there are many hadoop user accounts. kerberos is strictly different for different users. Different users have different tickets, and tickets cannot be used universally. Therefore, after switching users, if you are using this user for the first time, you need to perform the kerberos authentication of this user. That is to say, kinit can obtain the legal bill of the user before using the corresponding service. If the ticket is not initialized or has not been successfully authenticated, the operation on the Hadoop cluster will fail with a connection error.
In order to ensure that the continuous operation of the Hadoop cluster is not affected, the validity period of the ticket is set to 300 days, that is, if each user obtains the ticket, the initialization authentication may not be performed again within 300 days, but when the system ticket is deleted. , the user's authentication will fail, need to re-initialize the authentication, which may cause some automated scripts to fail or the task fails.
Some scripts are written in a way that is not authenticated. In a cluster with validation enabled, it is recommended to retest and refine the script to make the script suitable for the authentication cluster environment.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.