The same travel Hadoop security practice
0x01 background
The current larger companies have adopted a shared Hadoop cluster mode.
Shared Hadoop refers to: data storage, public / private file directory mixed stored in hdfs, different users access to different data on demand; computing resources, the administrator by department or business divided into several queues, each queue allocation A certain amount of resources, each user / group can only use the resources in a queue. This model can reduce maintenance costs, to avoid data redundancy and reduce hardware costs. But one of the biggest problems with this approach to cloud storage / cloud computing is security. In the same way, information security is infiltrated into each department, data security is the top priority. This article shares the same-site travel big data architecture department Hadoop security practices in this regard.
However, Hadoop lacks security. One of the most crucial question is, Client's user name and user group name arbitrary. By default, Hadoop takes HADOOP_USER_NAME as the current user name from the environment variable and HADOOP_USER_NAME from the system parameters if it is empty. If it is still empty, it retrieves the current system user name and adds it to the user.name and group.name properties. Users can even directly specify usernames and user groups as root using conf.set ("user.name", root), conf.set ("group.name", "root"). This request submitted to the cluster can read and write to the root directory of the file does not belong to their own directory, using this belongs to someone else's computing resources.
The Hadoop team is also aware of the lack of data security and adds a mechanism for authorizing and authorizing permissions. After the user's login name is passed to the RPC through the RPC header, the RPC uses the Simple Authentication and Security Layer (SASL) to determine an authorization protocol (typically using the kerberos protocol) to complete the RPC authorization. However, enabling kerberos authentication in a hadoop cluster has the following issues:
1. Kerberos generation of certificates and configuration steps is very complicated, debug troubleshooting is also obscure, without some experience difficult to get started; 2. Poor scalability. The expansion and reduction of the machine will cause the certificate to be regenerated and distributed, resulting in great difficulties in operation and maintenance; 3. There is a single point of failure, the central KDC central server carries all the user's keys, and the whole system may hang when it is hung up. At the same time require strict clock synchronization, otherwise it will lead to certification failed.
Based on the above reasons, as well as the characteristics of Hadoop cluster with its own Hadoop cluster (carrying a data volume close to tens of P, a daily increment of tens, an upper reliance on hundreds of platforms / services and tens of thousands of data handling / computing tasks) To Hadoop security mechanism is undoubtedly for high-speed operation of the car for wheels, the system requires zero service interruption, rolling upgrade without downtime. We independently developed a lightweight Hadoop user authentication mechanism.
0x02 basic idea
The most common method of authentication is user name and password verification, in line with our easy-to-use requirements. First, we want a user to read a configured, user-associated password before interacting with Hadoop. The password is then saved and carried in every request to interact with Hadoop for verification. In addition, we know that users interact with the namenode before doing anything with Hadoop, getting information about the block, leases for reading and writing files, and so on. It is natural to think that you can do this in the namenode user authentication operation. The user name and password mapping table can be configured and remote RPC form of hot-loaded.
0x03 Specific implementation
User loads the password
User information is implemented in Hadoop as a UserGroupInformation class. If the subject is empty, or the corresponding principal is empty, that has not been logged in, it will call getLoginUser () method. The first step is to create a LoginContext and call the login () method. After the call, login.commit () is called. We add the operation of reading the password configuration to the commit () method and storing it in the subject's credential. The password can be configured in the user's home directory, or classpath. Because login method will only be called when the user first started, so to avoid the problem of repeated loading.
Namenode Load user name and password mapping table
We added a new RPC service that lets namenode read the mapping table for the cluster user name and password. Most current Hadoop RPC calls have adopted Google's protobuf protocol, replacing their own set of writable protocols of the 1.0 era.
According to the specification of protobuf, we define a protocol file named RefreshCheckUserPaswordProtocol.proto. In the file we define the RPC request and reply message
message RefreshCheckUserPasswordRequestProto {} messageRefreshCheckUserPasswordResponseProto {} The content is empty, because you do not need to pass parameters and then redefine the service called RefreshCheckUserPasswordProtocolService {rpcrefreshCheckUserPassword () returns (RefreshCheckUserPasswordResponseProto);} By protobuf command line tool to generate the corresponding request, response, and service java The class is registered in namenodeProtocols as extendsClientProtocol, DatanodeProtocol, NamenodeProtocol, RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol, GenericRefreshProtocol, GetUserMappingsProtocol, HAServiceProtocol, RefreshCheckUserPasswordProtocol {} and adds a specific namenodeProtocols implementation class NameNodeRpcServer Function call RefreshCheckUserPasswordProtocolServerSideTranslatorPB refreshCheckUserPasswordXlator (); BlockingService RefreshCheckUserPasswordProtocolService .newReflectiveBlockingService (refreshCheckUserPasswordX (cmd)) {exitCoderefreshCheckUserPassword ();} () throwsIOException {Configuration getConf (); .set (CommonConfigurationKeys.HADOOP_SECURITY_SERVICE_USER_NAME_KEY, .get (DFSConfigKeys. lator); Added dfsadmin's shell command line to call this service (ProxyAndInfoRefreshCheckUserPasswordProtocol); {{.getProxy (); DFS_NAMENODE_KERBEROS_PRINCIPAL_KEY,); DistributedFileSystem getDFS (); URI.getUri (); HAUtil.isLogicalUri (,); () {.getHost (); ListProxyAndInfoRefreshCheckUserPasswordProtocolHAUtil.getProxiesForAllNameNodesInNameservice .refreshCheckUserPassword (); ..println (.getAddress ());} (e) {..println (.getAddress ()); ..println (e.getMessage ());}}} {RefreshCheckUserPasswordProtocolrefreshProtocol NameNodeProxies.createProxy , FileSystem.getDefaultUri (), RefreshCheckUserPasswordProtocol.). GetProxy (); refreshProtocol.refreshCheckUserPassword (); ..println ();};}
Finally, we automatically load the user name and password mapping table when the namenode is started (initialized).
The exact location is at the end of the void initialize (Configuration conf) throws IOException method of the NameNode class
Namenode client authentication
When Namenode receives the user's request, it calls the getRemoteUser () method to assemble the user's information into a UserGroupInformation instance. At this time, it is natural for us to extract the user name, password and source IP of the initiator of the request for user authentication. The logic of user authentication is very simple, the user name, password, IP mapping table in the startup load to find contrast, meet the conditions to be passed, or intercept the request and throw an error.
0x04 some work on the line
In order to make the new user security features smooth on-line, we have done the following additional features
User-verified global switch
Prevent possible new bugs, a key switch can quickly turn off the new features to achieve rapid rollback.
User whitelist
A large number of user groups, the same time switch on the line is impossible. We put the user into the whitelist, batch configuration / test assigned account and password, step by step to achieve the safety of all accounts of regulatory
0x05 online process
1. recompiled and packaged hadoop-common and hadoop-hdfs project
2. replace the namenode and datanode corresponding jar package, and scroll restart. Because the default is to turn off the security features, the new version of the jar package and the old version of the jar are compatible with each other.
3. In each node, the project configuration and user name and password, user name and password through the command to load the mapping table namenode
4. A key to open the security configuration
At present, Hadoop security has been successfully launched, there is no service interruption or downgrade during the upgrade process, and the service is zero-aware. Running a few months down is also very stable, good performance.