Objective
At the beginning of 2014, we switched the online Hadoop 1 cluster to the Hadoop 2.2.0 stable, while deploying the security certification for Hadoop. This paper mainly introduces the implementation of the scheme of security authentication on Hadoop 2.2.0 and the corresponding solutions.
Background cluster security is relatively weak
The first deployment of the Hadoop cluster does not consider security issues, as the cluster continues to expand, the use of the cluster needs increased, cluster security issues are quite important. When it comes to security issues, it generally includes the following:
-
User authentication (authentication)
That is to check the identity of the user, confirming that the user is the identity of their claims, which includes the authentication of users and services.
-
User Authorization (Authorization)
That is, permission control, authorization or denial of access to specific resources, specific access users. User authorization is the establishment of re-user authentication on the basis of no reliable user authentication without user authorization.
When security authentication is not turned on, Hadoop is a user credential provided by the client, typically a UNIX user who initiates the task. General on-line machine Deployment Services use a unified account, when deploying a cluster in a unified account, all users who perform Hadoop tasks are super administrators of the cluster, prone to mis-operation. Even if the cluster is deployed as an administrator account, the malicious user can still impersonate the administrator account on the client.
Cluster overall upgrade to Hadoop 2.0
October 2013 Hadoop 2.2.0 released as a GA version of Apache Hadoop 2.X. We consider upgrading the cluster as a whole with Hadoop 2.2.0 into the yarn age. At the same time, we plan to make cluster security work in place during the upgrade process, mainly based on the following considerations:
- As with the upgrade work, security is also the basic work, the security will facilitate our follow-up work, otherwise it will become the next obstacle.
- The so-called basic work, that is, the more difficult to change the work, the current do not do, the future depends on more, to carry out the cost is greater.
In conclusion, our requirement is to deploy Hadoop security authentication in the process of upgrading from low version Hadoop to yarn, and we can enable the appropriate permissions control (HDFS, queue) on top of this.
Program Research
The following principles of safety practice should be clarified before the project is investigated, as follows:
- As a back-end service platform, the main purpose of deployment security is to prevent accidents caused by user misoperation (such as mis-deletion of data, mis-operation, etc.)
- To do security is to open up, the premise of openness is to ensure basic security, data security and platform security
- As far as possible to ensure the safety of the premise, to simplify operation and maintenance
To analyze the problems we are experiencing, here we need to investigate:
- Account splitting and corresponding management scheme
- Turn on Hadoop security certification
- Client-side adjustment for secure authentication
Account splitting and corresponding management solution cluster account management
Originally we used a single account as a Cluster Administrator, and this account is a unified online login account, there is a great security risk. We need to use a special account to manage the cluster. The question here is, how many operations accounts do we need?
A simple way to do this is to use a special operations account (such as Hadoop), CDH and Apache are recommended to split accounts by service to start the cluster:
User:group |
daemons |
Hdfs:hadoop |
NameNode, secondary NameNode, Checkpoint node, Backup node, DataNode |
Yarn:hadoop |
ResourceManager, NodeManager |
Mapred:hadoop |
MapReduce jobhistory Server |
In view of the fine control can effectively avoid misoperation, here we follow the official recommendations of the use of multi-account.
When migrating from a single operations account to multiple account deployments, you need to consider file permissions issues, both local and HDFs, which can be changed when the security deployment goes live.
User account Management
Many groups in the United States have used Hadoop for large data processing needs, which requires a degree of multi-tenant environment, where the data and operational permissions are the main concern. HDFs itself only provides Unix-like permission systems, and the default group concept is relatively chicken. In view of this, there can be a simple and crude solution to multi-user management:
Different groups have their own root directory, using different accounts, the group has full access to the files. Data cannot be accessed from one group to another (unless manually modified).
In a centralized data warehouse environment, but also the production of various departments of statistical data, the above strategy is not flexible. At present Cloudera has a fine-grained rights-controlled solution Sentry, which supports Role based rights management. Because of its high customization, inconvenient to use, it is not considered.
Turn on Hadoop security certification
Security authentication for Hadoop is based on Kerberos. Kerberos is a network authentication protocol that allows a user to enter authentication information to verify that multiple access Kerberos services can be accessed by acquiring a ticket, and that a single sign-on of the machine may be done based on this protocol. Hadoop itself does not create user accounts, but instead uses the Kerberos protocol to authenticate users, obtaining user accounts from user information in Kerberos credentials, which is irrelevant to the actual user running the account.
Here's a brief description of the MR task submission process on YARN:
- Before the user performs the task, it authenticates himself with the KDC and obtains the TGT (Ticket granting Ticket). The KDC is a Kerberos-certified central service that stores authentication information for users and services.
- The user requests access to the service's ticket through the TGT to the KDC, and the KDC generates session key and sends it to the client.
- The client authenticates itself to the service through the services ticket and completes the identity authentication.
- After authentication, the client requests several tokens from the service for subsequent tasks to perform the authentication use (e.g. HDFS NameNode delegation token, YARN ResourceManager delegation token)
- The client submits the task together with the token acquired, and subsequent tasks perform the authentication from the service using tokens
As can be seen from the performance considerations, the Hadoop Security authentication system only in the user and service communication and the communication between the various services for Kerberos authentication, after the user authentication task execution, access to services, read/write data, etc. are used specific services (NameNode, Resource Manager) initiates a visit token to enable the demand party to access the service and data with token. Here token delivery, certification and updates are not discussed in depth.
Cloudera has a detailed documentation of the security credentials for Hadoop. Due to their own environment and the deployment of operational considerations, the final deployment scenarios are slightly out of the details.
Kerberos deployment
Hadoop security authentication requires a Kerberos cluster, and deploying Kerberos requires the deployment of a KDC. Due to the use of FreeIPA for host authentication-related permissions control in our environment, the Kerberos service has been integrated, so no additional deployment is required.
Kerberos-related operations, such as adding users, services, and exporting keytab, can be done through the IPA-related interfaces.
The choice of Container
It can be seen that the user-initiated task is performed within a specific container (Container), and at first we consider using Defaultcontainer instead of the officially recommended Linuxcontainer, with the disadvantage of physically isolating the tasks and preventing malicious tasks , but easy to deploy, the use of Linuxcontainer requires the deployment of user accounts on each machine in the cluster.
The actual test found that Defaultcontainer could not be used after security authentication was enabled on Hadoop 2.2.0 due to the introduction of MAPREDUCE-5208.
We do not want to have too many custom modifications to the code, we need to solve the problem by considering or using Linuxcontainer:
- User account creation
We need to add all possible tasks within the cluster to initiate user accounts. With Freeipa's unified user management, we only need to add the appropriate users on the Freeipa.
-
Deployment of Container-executor and Container-executor.cfg
Container-executor as Yarn's container execution program, there are a number of permission requirements:
be owned by Root
Being owned by a group, contains only the user running the YARN daemons
Be setuid
Be group readable and executable
Configuring CONTAINER-EXECUTOR.CFG is not only requiredowned by root, but also in the directory where it is locatedowned by root. Both of these are inconvenient for automated deployment, and since this part is relatively independent and basically does not change, we can add it to the puppet management of the cluster machine.
DataNode Start Mode
CDH recommended Datanode Start-up mode requires the use of low-end ports and the use of JSVC release, in the operation of the maintenance is not very convenient. Here we use configurationignore.secure.ports.for.testing=trueto start Datanode and circumvent these constraints.
Client-side adjustment for secure authentication
After the cluster has turned on security authentication, the client that relies on the cluster (script, service) needs to make corresponding changes, but the change is basically the same. Most services have included the appropriate processing for Kerberos authentication, and there is basically no need to modify it.
The first step here is to explain the authentication method after the security authentication is turned on:
- Using password Authentication
Using the user password through the kinit authentication, obtains the TGT existence local credential cache, for the subsequent Access Service authentication uses. Typically used in interactive access.
- Using KEYTAB Certification
Users through the export of keytab can be password-free user authentication, the next steps are consistent. Typically configured for use in applications.
The Kerberos voucher (ticket) has two properties,ticket_lifetimeand arenew_lifetime.ticket_lifetimeIt indicates the time limit for the voucher to take effect, typically 24 hours. Before the voucher expires, part of the voucher can be delayed (i.e. renewable), indicating that the voucher can be extended for a maximum periodrenew_lifetimeof time, usually a week. Subsequent access to a secure authenticated service fails after the credential expires. The first question here is how to handle credential expiration.
Voucher Expiration processing Policy
This assumption was made in the earliest Security features for Hadoop design:
A Hadoop Job would run no longer than 7 days (configurable) on a MapReduce cluster or accessing HDFS from the job would fail .
For general tasks, a 24 hour or even delay to a week's voucher time limit is sufficient. So most of the time we just need to use kinit authentication before performing the operation, and then the background task to update the periodic credentials.
However, these assumptions are not true for services that require permanent access to the Hadoop cluster. This time we can
-
Expansionticket_lifetimeandrenew_lifetimetime limits
Extended voucher survival time limit can solve this problem, but because Kerberos and our online user login authentication binding, will bring security risks, it is not convenient to modify.
-
Periodically re-kinit certification update voucher
It is not just a regular extension of the certification period, it can be re-certified directly to extend the voucher limit. In general, we need to export keytab to perform periodic authentication operations.
Hadoop encapsulates the Kerberos authentication section in a package that doesn't really need to be that complicated, so here's a look atUserGroupInformationthe class.
Usergroupinformation
UserGroupInformationThis class encapsulates the user information for Hadoop on the JAAS framework, or, more precisely, a layer of encapsulation of Subject.
UserGroupInformation(Subject subject) { this.subject = subject; this.user = subject.getPrincipals(User.class).iterator().next(); this.isKeytab = !subject.getPrivateCredentials(KerberosKey.class).isEmpty(); this.isKrbTkt = !subject.getPrivateCredentials(KerberosTicket.class).isEmpty(); }
JAAS is an abbreviation of the Java Authentication and authorization service (Java Authentication and Authorization service) and consists mainly of the following entities:
- Subject
Subject is a non-inheritable entity class that flags the source of a request, contains the associated credential identification (PRINCIPAL), and public and private credentials (credential).
- Principal
Credential identification, after successful authentication, a Subject can be associated with multiple principal.
- Credential
Credentials, with public credentials and private credentials.
The authentication process for Jaas is as follows:
- An application instantiates a logincontext.
- The LoginContext consults a Configuration to load all of the loginmodules configured for that application.
- The application invokes the LoginContext ' s login method.
- The login method invokes all of the loaded loginmodules. Each loginmodule attempts to authenticate the subject. Upon success, Loginmodules associate relevant principals and credentials with a Subject object that represents the Subject Being authenticated.
- The LoginContext returns the authentication status to the application.
- If authentication succeeded, the application retrieves the Subject from the LoginContext.
Code fragments that require authentication can be packaged in doprivileged and can be used directly toSubject.doAssupport nesting.
In safe mode, UGI supports different logincontext configurations, which are generated dynamically through the Hadoopconfiguration class:
- Hadoop-user-kerberos
Configure the login using Kerberos cache credentials,useTicketCacheset to True.
- Hadoop-keytab-kerberos
Use the keytab login configuration touseKeyTabset true.
There are multiple certifications in UGI, and the Getloginuser method useshadoop-user-kerberosconfiguration authentication:
- Build LoginContext by configuration
- Call the Logincontext.login method to complete the login, through the ticket cache credentials to complete the login
- Determine if additional user identities are required (proxy user) to perform
- AddHADOOP_TOKEN_FILE_LOCATIONthe tokens in the Credentials collection
- Another thread to do periodic credential updatesspawnAutoRenewalThreadForUserCreds
Step 5 shows that we do not need to proactively do periodic credential updates after we have a voucher.
While the Loginuserfromkeytab method useshadoop-kerberosconfiguration authentication:
- Build LoginContext by configuration
- Call the Logincontext.login method to complete the login, use keytab to complete the login
Loginuserfromkeytab do not update the voucher cycle, how to ensure that the voucher will not expire?
- You can call to try to update the credentials before accessing the cluster to perform related operationscheckTGTAndReloginFromKeytab(actually re-landed)
- When the credential expires, creating an IPC failure triggers a callreloginFromKeytabto re-login
Client.java
PrivateSynchronizedvoidHandlesaslconnectionfailure(Finalint Currretries,Finalint Maxretries,Final Exception ex,Final Random Rand,Final Usergroupinformation Ugi)Throws IOException, interruptedexception {Ugi.doas (New Privilegedexceptionaction<object> () {@Overridepublic Object run () throws ioexception, interruptedexception {final short Max_backoff = 5000; CloseConnection (); Disposesasl (); if (SHOULDAUTHENTICATEOVERKRB ()) {if (currretries < maxretries) { span class= "keyword" >if (log.isdebugenabled ()) {Log.debug ( "Exception encountered while Connecting to "+ //try re-login if (usergroupinformation.isloginkeytabbased ()) { Usergroupinformation.getloginuser (). Reloginfromkeytab (); } else {usergroupinformation.getloginuser (). Reloginfromticketcache ();}
It can be seen that if you use KEYTAB certification, certification is long-term effective.
As you can see from the above code, the creation of an IPC failure will attempt to re-login, whether or not it is a keytab authentication.
The Kerberos authentication method based on Keytab
To keep the user from remembering passwords, we can consider exporting and delivering keytab to the relevant user (provided that the number of users is controllable, such as in virtual user units).
This way, the user's Hadoop task authentication can be:
- Access directly after using keytab kinit
- or call tologinUserFromKeytabcomplete the login and then wrap the snippet in the UGI method todoAsExecute
On-line deployment
Having identified the deployment scenario, we completed the deployment of a secure authentication while upgrading the Hadoop version. There are a number of issues that we have encountered in deployment and use, as explained here.
JCE deployment
When you turn on security authentication, you find that Kerberos authentication does not pass:
Client failed to SASL authenticate: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
Because we deploy Kerberos by default with AES-256 encryption, we need to install Java cryptography Extension (JCE) Unlimited strength on the Hadoop environment (cluster and client) Jurisdiction Policy File, otherwise Kerberos authentication does not pass. You can use this gist to verify that the changes are in effect. This step can be added to puppet.
SNN GetImage return to NPE
Open Security certification Discovery SNN continued due to getimage error NPE exit, related errors are as follows.
2013-12-2923:56:19,572 DEBUG org.apache.hadoop.security.authentication.server.AuthenticationFilter:Request [http://XXX.com:50070/getimage?getimage=1&txid=8627&storageinfo=-47:2002718265:0:cid-3dce02cb-a1c2-4ab8-8b12-F23BBEFD7BCC] Triggering authentication2013-12-29 23: 56:19,580 WARN Org.apache.hadoop.security.authentication.server.AuthenticationFilter:Authentication exception:gssexception: Failure unspecified at GSS-API level (mechanism level:specified version of key was not available (44) ) Org.apache.hadoop.security.authentication.client.AuthenticationException:GSSException:Failure unspecified at Gss-api level (mechanism level:specified version of key was not available (44)) at Org.apache.hadoop. Security.authentication.server.KerberosAuthenticationHandler.authenticate (kerberosauthenticationhandler.java:< Span class= "number" ) at Org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter ( Authenticationfilter.java:349
Based on the error messageSpecified version of key is not available, it is found that the same HTTP credential was exported multiple times, causing the credentials in the previous keytab to be invalidated, and the keytab needed to regenerate the deployment.
The reminder here is not to re-export the same credentials to prevent the vouchers in the keytab that have been distributed from being invalidated.
Balancer execution too long causes certification to expire
After the security certification is deployed, we need to pre-certify the HDFS data for balance and then run into the issue of the authentication deadlines we said earlier.
There are two ways to solve this problem:
- Add external timer task re-authentication, refresh credential cache, delay voucher expiration.
- You can write a small code to balance the entrance of aorg.apache.hadoop.hdfs.server.balancer.Balancerlittle encapsulation, wrap it in a doAs, like Hue in the Sudofsshell of the same idea
SSSD Service Authentication exception
SSSD refers to our use of online authentication of a low-level service, in the past a period of time the problem often exits, resulting in user login action hang, resulting in the failure of the relevant task execution. After deploying Hadoop security authentication, the related Kerberos authentication also goes this service, which increases the probability of service abnormal exit. It seems that the SSSD service problem is due to the system version being too low SSSD The service code has bugs, the solution is most convenient to upgrade the system or switch services to the new machine.
"KDC can ' t fulfill requested option while renewing credentials"
The application execution log occasionally reports the following error:
2014-03-12 21:30:03,593 WARN security.UserGroupInformation (UserGroupInformation.java:run(794)) - Exception encountered while running the renewal command. Aborting renew thread. org.apache.hadoop.util.Shell$ExitCodeException: kinit(v5): KDC can‘t fulfill requested option while renewing credentials
Indicates that the credential update thread for UGI failed to exit. At present HADOOP-10041 records this problem, the main reason is because the voucher can not be updated, generally do not need special processing.
"Turn from": http://tech.meituan.com/hadoop-security-practice.html
Group Reviews Technical Team
"Go" hadoop security practices