Hadoop1.0 Security Authentication (kerberos) Installation and summary

Source: Internet
Author: User
As the saying goes, everything starts hard. In software development, environment deployment is the first threshold. In MRv1 mode, cdh5.0.2.tar.gz is successfully integrated with Kerberos Security Authentication and is in full-distributed mode. After this installation process, we have a deep understanding of hadoop. Now, after you build the ring

As the saying goes, everything starts hard. In software development, environment deployment is the first threshold. In MRv1 mode, cdh5.0.2.tar.gz is successfully integrated with Kerberos Security Authentication and is in full-distributed mode. After this installation process, we have a deep understanding of hadoop. Now, after you build the ring

As the saying goes, everything starts hard. In software development, environment deployment is the first threshold. In MRv1 mode, cdh5.0.2.tar.gz is successfully integrated with Kerberos Security Authentication and is in full-distributed mode. After this installation process, we have a deep understanding of hadoop. Now, after I have just built the environment, my mind may flash and encounter errors from time to time. I will record the installation process here for my convenience in the future, on the other hand, we hope to inspire people who encounter the same problems in the future.

First of all, let's explain why we should use tarball for installation. cdh provides a manager Method for installation, apt-get for the Debian series, and yum for the Redhat series, however, these installation methods have completed some details for us. If we want to encounter any problems in the future, debugging is not convenient. In addition, tar.gz is my first choice for software.

Let's talk about my environment, four Centos6.5 servers, one master server, namenode and jobtracker, and the other three running datanode and tasktracker. In addition, the Kerberos server is also running on the master machine. The JDK version is 1.7.60. For environment requirements for version 5.0.2, see http://www.cloudera.com/content/support/en/downloads/cdh/cdh-5-0-2.html#systemrequirements.

This article is mainly about the pitfalls I encountered when referring to the tutorials on the cloudera official website. You can go to this tutorial and read my article.

Cdh5.0.2 hadoop users are no longer used here, instead of mapred users and hdfs users. Here, the ssh public key and secret key must be produced for them respectively, and configure password-free login (of course, you can generate one of them and copy it directly ).

1. Kerberos Installation

First, check the principles of Kerberos and what to install on your own in Wikipedia. You can also check the specific commands such as kinit and kadmin. In the future, I will talk about Kerberos separately.
You can follow this article to perform operations: Kerberos deploy guide.

Next we need to generate their prinal Al and keytab for the mapred and hdfs users of each node in the cluster. Therefore, you must be familiar with kerberos commands and make these things into scripts.

2. Install CDH5.0.2.tar.gz and download the tar package 2.1.

First, refer here.

2.2 change YARN mode to MRv1 Mode

Here, we need to talk about the YARN mode for the 5.0.2 tarball version. Here we set up the MRv1 (Common MapReduce) mode, therefore, you need to make some modifications to the files decompressed by the tar package.

Decompress hadoop-2.3.0-cdh5.0.2.tar.gz. Suppose you decompress the package to the/opt directory and rename the hadoop-2.3.0-cdh5.0.2 directly to hadoop (now your cdh root directory is/opt/hadoop). Let's make the following changes:

  1. Copy the bin-mapreduce1 so the file to the bin, for the same file, directly overwrite it.
  2. The following file structure is available in the/opt/hadoop/share/hadoop Folder:

Delete the soft link file mapreduce, and create a soft link with the Same Name Pointing to mapcece1

cd /opt/hadoop/share/hadoop/rm mapreduce -rfln -s mapreduce1 mapreduce

After the above two steps, the MRv1 mode is enabled by default. You can first build a fully distributed architecture without Kerberos security authentication, after the distributed architecture without Kerberos security authentication is established (refer to Hadoop distributed architecture), perform the following operations.

2.3 configure HDFS

Next, you can follow the tutorials on the official website. The following describes the pitfalls I encountered:

  • In STEP 2, you must create/tmp in hdfs with reference to MRv1 cluster deploy, and change it to the directory specified by mapred. system. dir.
  • In STEP 7, the final dfs. http. policy property does not need to be configured. Otherwise, an exception will be reported when namnode is started later. The keystore file cannot be found.
  • Optional For steps 8, 9, and 10.
  • In STEP 11, configure jsvc_home, and extract the bigtop-jsvc-1.0.10-cdh5.0.2.tar.gz file that we started with, and place it in the specified position.
  • When starting datanode and namenode in steps 12 and 13, run the sbin command
sbin/hadoop-daemon.sh start namenodesbin/hadoop-daemons.sh start datanode

Both commands are executed by the root user.

In this process, if any logs folder cannot be written, change its permission to 777.

2.4 configure mapreduce

Configure mapreduce and refer to the official tutorial. The following describes the pitfalls I encountered:

  • The first is taskcontroller. cfg file problems, hadoop will be in /.. /.. /conf/to find this file, so we need to create a conf file under/opt/hadoop (that is, the root directory after decompression, then follow the official configuration. The official tutorial contains
banned.users=mapred,hdfs,bin

After this configuration, running wordcount will report an exception. You can directly set the value to bin here.

  • Then there is a problem with the permission of the task-controller file. You must use the following command to modify the file:
chown root:mapred task-controllerchmod 4754 task-controller

The explanation is also available on the official website.

  • After all this is configured, it is still wrong to start jobtracker and tasktracker. This error is
2014-07-15 18:15:25,722 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Secure IO is necessary to run a secure task tracker.        at org.apache.hadoop.mapred.TaskTracker.checkSecurityRequirements(TaskTracker.java:943)        at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:976)        at org.apache.hadoop.mapred.TaskTracker.
 
  (TaskTracker.java:1780)        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:4124)
 

This error is caused by the lack of native packages. This package should be in/opt/hadoop/lib/native/. Unfortunately, we need to compile these items by ourselves because each version is different, I can't copy apache hadoop's native directly.I stole a lazy copy from my colleagues. I will talk about how to compile from the source code later.Next we will talk about how to compile.
The source code of cdh is stored in the src folder. After maven is installed, it can be compiled directly in this folder. Here I use centos6.5 for compilation. The main problems encountered are:

1. maven repository is often unable to be connected, and repeated operations are required. In the future, it is best to install some dependencies on the local repo2. centos. Otherwise, various errors will be returned, run the following command yum install-y glibc-headers gcc-c ++ zlib-devel openssl-devel 3. if some test files fail, you must skip this step. Use the following command to compile mvn package-Pdist and native-DskipTests.

Well, if you still encounter any missing dependencies during the compilation process, go to google and click OK.

  • There is an error message in starting jobtracker and tasktracker with root, prompting us not to start these two processes directly with root, Just configure the following information in the haooop-env.sh.
export HADOOP_JOBTRACKER_USER=mapredexport HADOOP_TASKTRACKER_USER=mapred

The command to start is

sbin/hadoop-daemon.sh start jobtrackersbin/hadoop-daemons.sh start tasktracker

It is also started by the root user.

3. Summary

It took a week before and after the construction process. The trouble was one thing. Various permissions were mainly caused by my lack of knowledge about the basic components of hadoop. All parts of hadoop were separated, each folder in the share/hadoop directory corresponds to a function. I first wanted to put them together, resulting in repeated configuration files and conflicts between different modules, in the end, the process cannot start. In the future, we need to strengthen our understanding of the basic concepts. The second is to check the log file for errors. Many errors can be corrected directly based on the error information.

In addition, you cannot follow the tutorial step by step. You should first take a look at the information such as the faq, so as to have an overall grasp, so that you will not remove the east wall to make up the west wall, in the end, the problem could not be completely solved.

Next we will build the HA environment. This time, we must improve efficiency !!!

As the saying goes, everything starts hard. In software development, environment deployment is the first threshold. In MRv1 mode, cdh5.0.2.tar.gz is successfully integrated with Kerberos Security Authentication and is in full-distributed mode. After this installation process, we have a deep understanding of hadoop. Now, after I have just built the environment, my mind may flash and encounter errors from time to time. I will record the installation process here for my convenience in the future, on the other hand, we hope to inspire people who encounter the same problems in the future.

Original article address: hadoop1.0 Security Authentication (kerberos) Installation and summary, thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.