* Hadoop is an open-source distributed computing framework of Apache open-source organization. It has been applied to many large websites, such as Amazon, Facebook, and Yahoo. For me, a recent use point is log analysis on the service integration platform. The log volume of the service integration platform will be large, which is exactly in line with the applicable scenarios of distributed computing (Log Analysis and index creation are two major application scenarios ).
Today, we are going to build Hadoop 2.2.0. The actual environment is the current mainstream server operating system CentOS 5.8.
I. Actual environment
System Version: CentOS 5.8 x86_64
JAVA version: JDK-1.7.0_25
Hadoop version: hadoop-2.2.0
192.168.149.128 namenode (Act as namenode, secondary namenode, and ResourceManager)
192.168.149.129 datanode1 (Act as datanode and nodemanager)
192.168.149.130 datanode2 (Role of datanode and nodemanager)
Ii. System Preparation
1. Hadoop can download the latest version Hadoop2.2 from the Apache official website. Currently, the linux32-bit system executable file is provided officially. to deploy the file on a 64-bit system, you need to download the src Source Code and compile it by yourself. (If it is a real online environment, download the 64-bit hadoop version to avoid many problems. Here we use the 32-bit version)
Hadoop
Http://apache.claz.org/hadoop/common/hadoop-2.2.0/
Java download
Http://www.Oracle.com/technetwork/java/javase/downloads/index.html
2. Here we use three CIDR servers to build a Hadoop cluster. The roles of the two are as shown above.
Step 1: we need to set the corresponding host name in/etc/hosts of the three servers as follows (intranet DNS resolution can be used in the real environment)
[Root @ node1 hadoop] # cat/etc/hosts
# Do not remove the following line, or various programs
# That require network functionality will fail.
127.0.0.1 localhost. localdomain localhost
192.168.149.128 node1
192.168.149.129 node2
192.168.149.130 node3
(Note * hosts resolution must be configured on the namenode and datanode servers)
Step 2: log on to each datanode server without a password from namenode. The following configuration is required:
Run ssh-keygen on namenode 128 and press Enter.
Copy the public key/root/. ssh/id_rsa.pub to the datanode server as follows:
Root@192.168.149.129 for ssh-copy-id-I. ssh/id_rsa.pub
Root@192.168.149.130 for ssh-copy-id-I. ssh/id_rsa.pub
Iii. Java installation and configuration
Tar-xvzf jdk-7u25-linux-x64.tar.gz & mkdir-p/usr/java/; mv/jdk1.7.0 _ 25/usr/java.
After installation and configuration of java environment variables, add the following code at the end of/etc/profile:
Export JAVA_HOME =/usr/java/jdk1.7.0 _ 25/
Export PATH = $ JAVA_HOME/bin: $ PATH
Export CLASSPATH = $ JAVE_HOME/lib/dt. jar: $ JAVE_HOME/lib/tools. jar :./
Save and exit, and then execute source/etc/profile to take effect. If you run java-version on the command line, the installation is successful.
[Root @ node1 ~] # Java-version
Java version "1.7.0 _ 25"
Java (TM) SE Runtime Environment (build 1.7.0 _ 25-b15)
Java HotSpot (TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
(Note * We need to install Java jdk on the namenode and datanode servers)
For more details, please continue to read the highlights on the next page:
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)