YARN & HDFS2 Installation and configuration Kerberos

Source: Internet
Author: User
Tags auth chmod parent directory ticket file permissions kinit

Today, trying to configure Kerberos on the Hadoop 2.x development cluster, I've encountered some problems and recorded

Set up Hadoop security

Core-site.xml

        <property>
           <name>hadoop.security.authentication</name>
           <value>kerberos</ value>
        </property>
        <property>
           <name>hadoop.security.authorization</name >
           <value>true</value>
        </property>

Hadoop.security.authentication default is simple, that is, based on the Linux operating system authentication method, the client calls the WhoAmI command, and then RPC call to the server, It is easy for a malicious user to forge an identical user in another host. Here we change to Kerberos.

set HDFs security Hdfs-site.xml

        <property> <name>dfs.block.access.token.enable</name> <value& gt;true</value> </property> <property> <name>dfs.https.enable</
                name> <value>false</value> </property> <property>
        <name>dfs.namenode.https-address</name> <value>dev80.hadoop:50470</value> </property> <property> <name>dfs.https.port</name> < value>50470</value> </property> <property> &LT;NAME&GT;DFS.NAMENODE.K eytab.file</name> <value>/etc/hadoop.keytab</value> </property> &L T;property> <name>dfs.namenode.kerberos.principal</name> <value>hadoo p/_host@dianping.com</value> </property> <property> <name>dfs.namenode.kerberos.https.prin cipal</name> <value>host/_HOST@DIANPING.COM</value> </property> &L T;property> <name>dfs.namenode.secondary.http-address</name> <value>d ev80.hadoop:50090</value> </property> <property> <name>dfs.namenod E.secondary.https-port</name> <value>50470</value> </property> ;p roperty> <name>dfs.namenode.secondary.keytab.file</name> <value>/et C/hadoop.keytab</value> </property> <property> <name>dfs.namenode. Secondary.kerberos.principal</name> <value>hadoop/_HOST@DIANPING.COM</value> ;/property> &Lt;property> <name>dfs.namenode.secondary.kerberos.https.principal</name> & lt;value>host/_host@dianping.com</value> </property> <property> <n
        Ame>dfs.datanode.data.dir.perm</name> <value>700</value> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0: 1003</value> </property> <property> <name>dfs.datanode.http.addres
                s</name> <value>0.0.0.0:1007</value> </property> <property> <name>dfs.datanode.https.address</name> <value>0.0.0.0:1005</value&gt
        ;
                </property> <property> <name>dfs.datanode.keytab.file</name> <value>/etc/hadoop.keytab</value> </property> <property> <name>dfs.data Node.kerberos.principal</name> <value>hadoop/_HOST@DIANPING.COM</value> </pro
                Perty> <property> <name>dfs.datanode.kerberos.https.principal</name>
                <value>host/_HOST@DIANPING.COM</value> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>700</value> </proper ty> <property> <name>dfs.datanode.address</name> <value&gt ;0.0.0.0:1003</value> </property> <property> <name>dfs.datanode.ht tp.address</name> <value>0.0.0.0:1007</value> </property> <prop
               Erty> <name>dfs.datanode.https.address</name> <value>0.0.0.0:1005</value> </ property> <property> <name>dfs.datanode.keytab.file</name> &L t;value>/etc/hadoop.keytab</value> </property> <property> <name&gt
        ;d fs.datanode.kerberos.principal</name> <value>hadoop/_HOST@DIANPING.COM</value> 
                </property> <property> <name>dfs.datanode.kerberos.https.principal</name>
                <value>host/_HOST@DIANPING.COM</value> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>http/_host@di anping.com</value> </property> <property> <name>dfs.web.authenticati on.kerberos.keytab</nAme> <value>/etc/hadoop.keytab</value> <description> The Kerberos keytab file with the credentials for the HTTP Kerberos principal US
              Ed by Hadoop-auth in the HTTP endpoint. </description> </property>
There are several points to note in the configuration of 1. Dfs.datanode.address represents the hostname or IP address to which the data transceiver RPC server is bound, and if security is turned on, the port number must be less than 1024 (privileged port). Otherwise, start Datanode will report "Cannot start secure cluster without privileged resources" error 2. The instance part of the principal can use the ' _host ' tag, which will automatically replace it with the full name Domain 3. If security is turned on, Hadoop does permission check for HDFs block data (specified by Dfs.data.dir), in the way that the user's code does not invoke the HDFs API but directly reads block data locally, This bypasses Kerberos and file permission validation, and administrators can modify Datanode file permissions by setting Dfs.datanode.data.dir.perm, which we set to 700
Namenode and secondary namenode are all started as Hadoop users datanode need to be started with jsvc as Root, and Hadoop 2.x itself has a 32-bit version of Jsvc, Need to go to the JSVC website to download the compilation again
wget http://mirror.esocc.com/apache//commons/daemon/binaries/commons-daemon-1.0.15-bin.tar.gz
CD src/native/ Unix Configure Make
Generate JSVC 64-bit executable, copy it to $hadoop_home/libexec, then need to specify jsvc_home to this path in hadoop-env.sh, otherwise it will be an error "It looks like you" re trying To start a secure DN, but $JSVC _home isn ' t set. Falling back to starting insecure DN. "
[hadoop@dev80 unix]$ file jsvc
jsvc:elf 64-bit LSB executable, x86-64, version 1 (gnu/linux), dynamically linked (use s shared libs), for Gnu/linux 2.6.18, not stripped
MVN package Compile Commons-daemon-1.0.15.jar, copy to $hadoop_home/share/hadoop/hdfs/lib, and delete your own version of the Commons-daemon Jar pack
Modified in hadoop-env.sh
# The JSVC implementation to use. Jsvc is required to run secure datanodes.
Export Jsvc_home=/usr/local/hadoop/hadoop-2.1.0-beta/libexec
# on secure datanodes, user to run the Datanode as after Dropping privileges
export Hadoop_secure_dn_user=hadoop
# The directory where PID files are stored./tmp by DEFAU Lt
Export Hadoop_secure_dn_pid_dir=/usr/local/hadoop
# Where log files are stored in the SECURE Data environment.
Export Hadoop_secure_dn_log_dir=/data/logs

Distribution configuration and jar to the entire cluster start Namenode with the Hadoop account, then switch to root, then start Datanode, find Namenode on the Web page with the "Security are on" Start secure Datanode command
  exec "$JSVC" \
           -dproc_$command-outfile "$JSVC _outfile" \
           -errfile "$JSVC _errfile" \
           -pidfile "$HADOOP _ Secure_dn_pid "\
           -nodetach \
           -user" $HADOOP _secure_dn_user "\
            -cp" $CLASSPATH "\
           $JAVA _heap_max $ hadoop_opts \
           org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"
If there are any problems with the startup process $JSVC _outfile (default is $hadoop_log_dir/jsvc.out) and $JSVC _errfile (default is $hadoop_log_dir/jsvc.err) information to arrange the error
Set Yarn security
Yarn-site.xml
        <property> <name>yarn.resourcemanager.keytab</name> <value&gt ;/etc/hadoop.keytab</value> </property> <property> <name>yarn.reso Urcemanager.principal</name> <value>hadoop/_HOST@DIANPING.COM</value> </prope rty> <property> <name>yarn.nodemanager.keytab</name> <valu E>/etc/hadoop.keytab</value> </property> <property> <name>yarn. Nodemanager.principal</name> <value>hadoop/_HOST@DIANPING.COM</value> </prope
                Rty> <property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </propert Y> <properTy> <name>yarn.nodemanager.linux-container-executor.group</name> <value& Gt;hadoop</value> </property>
The container-executor default is Defaultcontainerexecutor, which starts container as a NodeManager user. Switching to Linuxcontainerexecutor is initiated as a application user, and it uses a setuid executable to start and destroy container this executable file in Bin/container-executor , but Hadoop defaults to a 32-bit version, so you need to recompile the download Hadoop 2.x source code MVN Package-pdist,native-dskiptests-dtar- DCONTAINER-EXECUTOR.CONF.DIR=/ETC Note: Container-executor.conf.dir must display a note indicating that the setuid executable file depends on the configuration file ( CONTAINER-EXECUTOR.CFG) path, the default will be under $hadoop_home/etc/hadoop, but because the file requires the parent directory and the above directory owner is root, otherwise there will be the following error, so in order to facilitate our set to/ etc
caused By:org.apache.hadoop.util.shell$exitcodeexception:file/usr/local/hadoop/hadoop-2.1.0-beta/etc/hadoop must be owned by Root, but are owned at
        Org.apache.hadoop.util.Shell.runCommand (shell.java:458) at
        Org.apache.hadoop.util.Shell.run (shell.java:373) at
        org.apache.hadoop.util.shell$ Shellcommandexecutor.execute (shell.java:578) at
        Org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init (linuxcontainerexecutor.java:147)

The default search configuration path
[Root@dev80 bin]# strings Container-executor  | grep etc
.. /etc/hadoop/container-executor.cfg
Look out is the default load $hadoop_home/etc/hadoop/container-executor.cfg
Add container-executor.conf.dir=/etc and then compile
[Hadoop@dev80 bin]$ Strings Container-executor | grep etc
/etc/container-executor.cfg

Set in Container-executor.cfg
Yarn.nodemanager.linux-container-executor.group=hadoop
min.user.id=499
Where min.user.id represents the minimum UID of the startup container, and if a UID start task is below this value, it is fail off. General Centos,rhel user account UID is starting from 500
Copy Container-executor to $hadoop_home/bin
Chown root:hadoop container-executor/etc/container-executor.cfg
chmod 4750 container-executor chmod 400/etc
/container-executor.cfg
Synchronize profiles to the entire cluster, starting with the Hadoop account ResourceManager and NodeManager Note: To facilitate testing, do not distribute the configuration to all clusters, but to a single, then RM and a NM, everything is okay, and sync to all host
Set Jobhistory server security
Mapred-site.xml
        <property>
                <name>mapreduce.jobhistory.keytab</name>
                <value>/etc/hadoop.keytab </value>
        </property>
        <property>
                <name>mapreduce.jobhistory.principal</ name>
                <value>hadoop/_HOST@DIANPING.COM</value>
        </property>
Start Jobhistoryserver sbin/mr-jobhistory-daemon.sh start Historyserver

Execute command kinit, get a TGT (ticket granting Ticket)
[Hadoop@dev80 hadoop]$ kinit-r 24l-k-t/home/hadoop/.keytab hadoop
[hadoop@dev80 hadoop]$ klist Ticket-Cache:f
ile:/tmp/krb5cc_500
Default principal:hadoop@dianping.com
Valid starting     Expires            Service Principal
09/11/13 15:25:34  09/12/13 15:25:34 krbtgt/dianping  . COM@DIANPING.COM
 Renew until 09/12/13 15:25:34
Where/tmp/krb5cc_500 is the Kerberos ticket cache, the default will be created in/tmp with the name "KRB5CC_" plus the UID file, where 500 represents the UID of the Hadoop account
[Hadoop@dev80 hadoop]$ getent passwd
Hadoop:x:500:500::/home/hadoop:/bin/bash
The user can also specify the ticket cache path by setting the export krb5ccname=/tmp/krb5cc_500 to the environment variable

You can use the Kdestroy command to destroy the ticket cache when you are done with it.
[Hadoop@dev80 hadoop]$ Kdestroy 
[hadoop@dev80 hadoop]$ klist klist:no credentials
Cache found (ticket cache FILE :/tmp/krb5cc_500)

If you do not have ticket cache locally, you will report the following error
13/09/11 16:21:35 ERROR Security. Usergroupinformation:priviledgedactionexception As:hadoop (Auth:kerberos) Cause:java.io.IOException: Javax.security.sasl.SaslException:GSS initiate failed [caused by gssexception:no valid credentials provided (mechanism L evel:failed to find any Kerberos TGT)]

Attached to the/etc/hadoop.keytab in the principal, are service Principal
[Hadoop@dev80 hadoop]$ klist-k-t/etc/hadoop.keytab keytab name:wrfile:/etc/hadoop.keytab KVNO Timestamp Princi PAL-----------------------------------------------------------------------------1 06/17/12 22:01:24 hadoop/ Dev80.hadoop@DIANPING.COM 1 06/17/12 22:01:24 hadoop/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 Hadoop/dev80.hado Op@DIANPING.COM 1 06/17/12 22:01:24 hadoop/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 Hadoop/dev80.hadoop@dianpin G.com 1 06/17/12 22:01:24 hadoop/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 host/dev80.hadoop@dianping.com 1 0 6/17/12 22:01:24 host/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 host/dev80.hadoop@dianping.com 1 06/17/12 22:01: Host/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 host/dev80.hadoop@dianping.com 1 06/17/12 22:01:24. Hadoop@DIANPING.COM 1 06/17/12 22:01:24 http/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 Http/dev80.hadoop@dianpin G.com 1 06/17/12 22:01:24 HTTP/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 http/dev80.hadoop@dianping.com 1 06/17/12 22:01:24 HTTP/dev80.hadoop @DIANPING. COM 1 06/17/12 22:01:24 http/dev80.hadoop@dianping.com

/home/hadoop/.keytab Delegate User Principal
[Hadoop@dev80 hadoop]$ klist-k-t/home/hadoop/.keytab
keytab name:wrfile:/home/hadoop/.keytab
Timestamp         Principal
-----------------------------------------------------------------------------
   1 04/11/12 13:56:29 Hadoop@DIANPING.COM
Because keytab is equivalent to having a permanent credential and does not need to provide a password (if you modify the principal password in the KDC, the keytab will fail), so other users who have read access to the file can access Hadoop as the user specified in keytab. So keytab files need to be sure that only the owner has Read permission (0400)
This article link http://blog.csdn.net/lalaguozhe/article/details/11570009, reprint please specify

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.