After more than a week, I finally set up the latest version of Hadoop2.2 cluster. During this period, I encountered various problems and was really tortured as a cainiao. However, when wordcount gave the results, I was so excited ~~ (If you have any errors or questions, please correct them and learn from each other)
In addition, you are welcome to leave a message when you encounter problems during the configuration process and discuss them with each other. You can also share the solution with us. The following comments have some problems and solutions. Please refer to them!
Part 1 Hadoop 2.2 download
For Hadoop, download the latest version Hadoop2.2 from the Apache official website. Currently, linux32-bit system executable files are officially provided, therefore, if you need to deploy a 64-bit system, you need to download the src Source Code and compile it by yourself (a solution link is provided in the comments on the 10th floor ).
: Http://apache.claz.org/hadoop/common/hadoop-2.2.0/
As shown in, download the red part. If you want to compile the SDK, download src.tar.gz.
Part 2 Cluster Environment Construction
1. Here we build a cluster composed of three machines:
192.168.0.1 hduser/passwd cloud001 nn/snn/rm CentOS6 64bit
192.168.0.2 hduser/passwd cloud002 dn/nm Ubuntu13.04 32bit
192.168.0.3 hduser/passwd cloud003 dn/nm Ubuntu13.0432bit
1.1 The above columns are IP, user/passwd, hostname, and roles (namenode, secondary namenode, datanode, resourcemanager, nodemanager) in the cluster)
1.2 The Hostname can be modified in/etc/hostname (ubuntu is in this path, and RedHat is slightly different)
1.3 here we have created an account hduser for each machine. here we need to assign sudo permissions to each account. (Switch to the root account, modify the/etc/sudoers file, and add: hduser ALL = (ALL)
2. Modify the/etc/hosts file and add the ip ing between the ip addresses and hostnames of the three hosts.
192.168.0.1 cloud001
192.168.0.2 cloud002
192.168.0.3 cloud003
3. SSH password-less login from cloud001 to cloud002 and cloud003
3.1 Install ssh
Generally, ssh commands are installed by default. If not, or the version is old, you can reinstall it:
Sodu apt-get install ssh
3.2 set local Login Without Password
After the installation is complete ~ Directory (the current user's main directory, that is,/home/hduser) generates a hidden folder. ssh (ls-a can view hidden files ). If you do not have this file, create it yourself (mkdir. ssh ).
The procedure is as follows:
1. Enter the. ssh folder
2. ssh-keygen-t rsa and then return to the vehicle (generate a key)
3. append id_rsa.pub to the authorization key (cat id_rsa.pub> authorized_keys)
4. restart the SSH server command to make it take effect: service sshd restart (here the RedHat is ssh in sshdUbuntu)
Now you can log on to the ssh localhost without a password.
[Note]: The preceding operations must be performed on each machine.
3.3 set Remote Login Without Password
Here, only cloud001 is the master. If there are multiple namenode or rm, password-free login is required for all other master nodes. (Append authorized_keys of 001 to authorized_keys of 002 and 003)
Enter the. ssh directory of 001
Scp authorized_keys hduser @ cloud002 :~ /. Ssh/authorized_keys_from_cloud001
Enter the. ssh directory of 002
Cat authorized_keys_from_cloud001> authorized_keys
So far, you can log on to sshhduser @ cloud002 without a password on 001. The 003 operation is the same.
4. Install jdk (the JAVA_HOME path of each machine is recommended to be the same)
Note: Download jdk and install it on your own, instead of directly installing it through the source (apt-get install)
4.1 download jkd (http://www.Oracle.com/technetwork/java/javase/downloads/index.html)
4.1.1 for 32-bit systems, you can download the following two Linux x86 versions (uname-a view of the System Version)
4.1.2 64-bit system download Linux x64((x64.rpmand x64.tar.gz)
4.2、install jdk(.tar.gz, 32-bit system as an example)
Installation Method reference http://docs.oracle.com/javase/7/docs/webnotes/install/linux/linux-jdk.html
4.2.1 select the location where you want to install java, such as in the/usr/directory, and create a folder named java (mkdirjava)
4.2.2 move the jdk-7u40-linux-i586.tar.gz file to/usr/java
4.2.3 decompression: tar-zxvf jdk-7u40-linux-i586.tar.gz
4.2.4 Delete jdk-7u40-linux-i586.tar.gz (to save space)
Now, after jkd is installed, configure the environment variables below
4.3 open/etc/profile (vim/etc/profile)
Add the following content at the end:
JAVA_HOME =/usr/java/jdk1.7.0 _ 40 (the version number 1.7.40 must be modified based on the download details)
CLASSPATH =.: $ JAVA_HOME/lib. tools. jar
PATH = $ JAVA_HOME/bin: $ PATH
Export JAVA_HOMECLASSPATH PATH
4.4. source/etc/profile
4.5 verify whether the installation is successful: java-version
[Note] Each machine performs the same operation and finally installs java in the same path (not required, but this will make subsequent configuration much more convenient)
5. Disable the firewall for each machine
RedHat:
/Etc/init. d/iptables stop disable the firewall.
Chkconfig iptables off disable startup.
Ubuntu:
Ufw disable (restart takes effect)
For more details, please continue to read the highlights on the next page:
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)