Hadoop, Hbase, and Zookeeper security practices
Speaking of security, there are two main aspects: Authentication and Authorization:
The Authentication task is to Authentication the user's identity, that is, you say that you are A user, and Authentication must ensure that you are A, not B; Authorization is to do permission control, it means that A user can only operate entities with permissions (such as HDFS files and Hbase tables), and cannot operate objects without permissions.
With Authentication and Authorization, it is relatively safe in general and basically won't happen. For example, user A accidentally deletes user B's data. In Hbase/Hadoop/Zookeeper, Authentication is implemented through Kerberos, and Authorization has its own implementation. In contrast, the implementation of Authentication is more complicated and there are many pitfalls in it, therefore, the majority of this article will focus on Authentication.
If you are not familiar with Kerberos, you can read this article: [Introduction to Hadoop Kerberos Security Mechanism], which gives a clear description of Kerberos Authentication principles.
The following is a summary of some pitfalls I encountered in practice.
Before practice, install the Kerberos server first. The installation of kerberos is relatively simple and is not discussed in this article. You can search for it directly on google. There should be a lot of related tutorial, generally, there will be no problem after you do it step by step. Note that there will be some minor differences between OS releases, such as Ubuntu and CentOS.
In addition, it should be noted that our security practices follow the principle of simplifying O & M while ensuring security as much as possible. This principle runs through our practices from beginning to end, for the convenience of O & M, we have also dug a lot of pitfalls and solved them one by one. This will be mentioned in the following introduction.
Hadoop
Hadoop Security configuration is implemented based on the official Cloudera article [processing ing Hadoop Security in CDH4]. The configuration items will not be listed again here, the official document has been clearly written. Here we will mainly introduce some problems we encountered in the configuration and solutions.
Q1
In the implementation of Hadoop, it is required that the principal of each service must contain the original design of FQDN (Fully Qualified Domain Name) Hadoop. I guess it is to make every principal can only be used on one machine, even if someone else takes the principal keytab file on a machine and does not use it on the machine, the security is enhanced to the maximum extent.
However, the consequence is that the O & M complexity is improved. Assume that there is a cluster of 1000 machines, where hdfs and yarn are deployed, at least 2000 + keytab files must be generated, and each new machine must generate a keytab file for the new machine, with so many keytab files added, the cluster deployment will be troublesome.
The principle we mentioned above is "simplify O & M as much as possible while ensuring security". We consider using the same principal for the same service in the same cluster, for example, one for hdfs and one for yarn. We think that as a backend service platform, the main purpose of security is to prevent accidents caused by misuse of users, such as accidental data deletion and misoperation. On this basis, we hope O & M is as convenient as possible.
Let's take a look at the principal check in Hadoop (org. apache. hadoop. security. SecurityUtil. java ):
public static String getServerPrincipal(String principalConfig, InetAddress addr) throws IOException { String[] components = getComponents(principalConfig); if (components == null || components.length != 3 || !components[1].equals(HOSTNAME_PATTERN)) { return principalConfig; } else { if (addr == null) { throw new IOException("Can't replace " + HOSTNAME_PATTERN + " pattern since client address is null"); } return replacePattern(components, addr.getCanonicalHostName()); }}
The main logic here is: first split the principal with '/@'. If the split fails or the split is not the third segment in the traditional format of 'hdfs/_ HOST @ realm, or if the second segment is not the pattern of '_ host', the original pincipal configured in the configuration file is directly returned. Otherwise, the' _ host' pattern is replaced with the FQDN, this is also the method recommended in hadoop and Cloudera official configuration. The first thing we need to do is replace the '_ host' pattern with another string, such as the format 'hdfs/hadoop @ realm.
Of course, it is not enough to do this, but there will be pitfalls. This is detailed in Q2.
Q2
Namenode failed to request JournalNode through HTTP. In Q1, we changed all prinal al of hadoop to 'hdfs/hadoop @ realm' and 'HTTP/hadoop @ realm ', in the form of 'yarn/hadoop @ realm', a new problem was found during startup.
Namenode cannot be started. The specific reason is that namenode will send an editlog request to journalnode at startup. The request here is implemented using the Http protocol, and the problem here is that the Http request fails, verification fails.
This pitfall was a big pitfall. At that time, I spent a lot of time using various methods and finally found the answer. Let's first look at a piece of code in java:
private void init(final String hostname, String scheme) throws GSSException { // here skip some unimportant code ... GSSManagerImpl manager = new GSSManagerImpl( GSSUtil.CALLER_HTTP_NEGOTIATE); String peerName = "HTTP/" + hostname; GSSName serverName = manager.createName(peerName, null); context = manager.createContext(serverName, oid, null, GSSContext.DEFAULT_LIFETIME); context.requestCredDeleg(true); oneToken = context.initSecContext(new byte[0], 0, 0); }
The NegotiatorImpl class is the implementation of the Negotiation implemented by SPNEGO provided by Java. The most important thing here is the line 'string peerName = "HTTP/" + hostname, it explicitly specifies that the peer's principal is HTTP/FQDN @ realm, so the token generated during negotiation is based on this principal.
On the server side (in journalnode), the configured principal is HTTP/hadoop @ realm. Therefore, negotiation will inevitably fail.
After finding the cause, we will introduce the solution.
We found that in sun.net. www. protocol. http. Negotiator. java, the NegotiatorImpl instance is created in the form of reflection:
abstract class Negotiator { static Negotiator getSupported(String hostname, String scheme) throws Exception { // These lines are equivalent to // return new NegotiatorImpl(hostname, scheme); // The current implementation will make sure NegotiatorImpl is not // directly referenced when compiling, thus smooth the way of building // the J2SE platform where HttpURLConnection is a bootstrap class. Class clazz = Class.forName("sun.net.www.protocol.http.NegotiatorImpl"); java.lang.reflect.Constructor c = clazz.getConstructor( String.class, String.class); return (Negotiator) (c.newInstance(hostname, scheme)); } abstract byte[] firstToken() throws Exception; abstract byte[] nextToken(byte[] in) throws Exception;}
Since it is registered through reflection, we can set classpath to construct the NegotiatorImpl. java we have modified. The specific practices include the following two steps:
Modify NegotiatorImpl. java:
String kerberosInstanceName = System.getProperty("kerberos.instance"); String peerName = null;if (kerberosInstanceName == null) { peerName = "HTTP/" + hostname;} else { peerName = "HTTP/" + kerberosInstanceName;}
Set boot classpath:-Xbootclasspath/p: $ path_to_modified_negotiator_jar
Input parameters at startup:-Dkerberos. instance = hadoop
Q3
DataNode needs to be started by root
In official hadoop and Cloudera documents, it is recommended that the security datanode should be started with a low port (<1024) and jsvc.
Here we encountered two problems: + in linux, the low-port program needs to be started with root; + we made a set of release scripts to deploy hadoop, it is deployed and started with a normal account, and cannot be well combined with jsvc.
These two problems cause datanode to be unable to be deployed and started normally like other procedures.
In fact, this problem has not been solved yet, but a small backdoor opened by datanode is used to configure 'ignore. secure. ports. for. testing = true', so you don't have to listen to the low port. You must use root jsvc to start it. At present, no other side effects have been found.
Q4
Remote clients cannot access HDFS
After solving the preceding pitfalls, HDFS with kerberos authencation can run normally.
Next, we need to verify that all the operations on the hdfs machine are performed with the shell provided by hadoop. next, verify the remote client on an external machine, but the client prompts that the valid credential cannot be found. It is initialized through kinit, and klist also shows that everything in ticket cache is normal.
What's going on? This is a small pitfall because of jce.
With AES-256 encryption, jce needs to be installed. On the hdfs machine deployed, we have installed these environments before deployment, so everything is normal. On the machine where the remote client is located, I didn't mean this problem at first. After jce is installed, it will be OK.
Q5
Yarn requires linuxiner iner, and linuxiner iner requires that users who submit MR tasks create Hadoop official and Cloudera documents on Yarn in advance. For security Yarn, linuxiner iner is recommended, while LinuxContainer has requirements, that is, the user account that submits jobs must be pre-created on the machine where each nodemanager is located in advance, which is a very troublesome task for O & M.
In addition, I personally think this requirement is a bit unreasonable. To use the service, users still need to create user accounts on the physical machine where the service is located, which is too unscientific! We have to use defaultiner iner for this purpose, and no big pitfall is found yet.
Q6
Only the prinal al of Namenode can be deployed in HDFS to perform administrator operations on the current HDFS. The user master used to deploy Namenode is the administrator of the entire cluster and has super permissions. After security authentication is enabled, the principal that deploys the namenode has the superuser permission.
The main trouble here is that if I want to remotely manage hdfs cluster through shell, I must copy the keytab file of principal of namenode everywhere, which increases security risks to some extent.
Based on this consideration, we add another feature to hdfs. You can specify a Super User through the configuration file, and this super user is authenticated by password on kerberos, changing the password at intervals is basically safe.
Q7
The client obtains the credential of principal from the ticket cache. We usually perform the following operations for hdfs with Kerberos Authentication:
Kinit principal_name # enter the password as prompted./bin/hdfs dfs-ls/
Kinit initializes principal. After initialization, you can see the ticket cache situation through klist. When the hadoop client with security authentication is enabled, it reads credential from the ticket cache.
Here is a problem. ticket will expire. How can we ensure that tasks that run for a long time do not have problems? Our solution is to use a cron job to regularly renew ticket and regularly entertain kinit-R.
However, during the specific implementation process, kinit-R reports the following error: 'kinit: Ticket expired while renewing credentials '. The main cause of this pitfall is that the kerberos server does not configure the renew time, you can configure it. For the previously generated principal, you must use modprinc to modify it separately.
Q8
HDFS failed to check UserGroup after opening ACL
This is only an exception, but the program can run normally. For friends with obsessive-compulsive disorder, you can Fix it. The default Recommended configuration for HDFS is "hadoop. security. group. mapping = org. apache. hadoop. security. shellBasedUnixGroupsMapping ", where hdfs users must belong to an existing group on the physical machine where hdfs is located, which is inconvenient for O & M. The reason is that the linuxiner iner in Yarn is the same.
Here, you can implement a simple GroupsMapping class. You can Fix this by specifying the configuration file.
HbaseQ1
The implementation of Hbase also requires that the principal of each service must contain FQDN
Hbase's security implementation is basically the same as that in hadoop. Principal is also required to contain FQDN, but its Code does not contain any additional check. You can directly modify it to 'hbase/hadoop @ realm' in the configuration file to run the principal normally.
Q2
SecureRpcEngine cannot be found
Open hbase for kerberos Security Authentication and configure "hbase. rpc. engine = org. apache. hadoop. hbase. ipc. secureRpcEngine ", according to the normal compilation and deployment, it was found that the SecureRpcEngine class was not found at startup.
We found that hbase with security was enabled and added-Psecurity during maven compilation, which is a little different from hadoop.
Q3
Administrator Problems
By default, Hbase provides a 'hbase. superuser' configuration item. You can specify a Super User without any additional code modifications.
ZookeeperQ1
The implementation of Zookeeper also requires that the principal of each service must contain FQDN
In zookeeper implementation, the server's prinal Al is passed through jaas on the server side. conf, while the client is the zookeeper/serverHost of hardcode. The following is the implementation in the Code (org. apache. zookeeper. clientCnxn. java ):
private void startConnect() throws IOException { state = States.CONNECTING; InetSocketAddress addr; if (rwServerAddress != null) { addr = rwServerAddress; rwServerAddress = null; } else { addr = hostProvider.next(1000); } LOG.info("Opening socket connection to server " + addr); setName(getName().replaceAll("\\(.*\\)", "(" + addr.getHostName() + ":" + addr.getPort() + ")")); try { zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName()); } catch (LoginException e) { LOG.warn("SASL authentication failed: " + e + " Will continue connection to Zookeeper server without " + "SASL authentication, if Zookeeper server allows it."); eventThread.queueEvent(new WatchedEvent( Watcher.Event.EventType.None, Watcher.Event.KeeperState.AuthFailed, null)); } clientCnxnSocket.connect(addr);}
Here, we need to modify 'zookeepersaslclient = new zooKeeperSaslClient ("zookeeper/" + addr. getHostName (); ', you can simply change it to 'zookeepersaslclient = new zooKeeperSaslClient ("zookeeper/hadoop");'. A better way is to change it to configurable, this is relatively simple and I will not go into details here.