Hadoop, a distributed computing open source framework for the Apache open source organization, has been used on many of the largest web sites, such as Amazon, Facebook and Yahoo. For me, a recent point of use is log analysis of service integration platforms. The service integration platform will have a large amount of logs, which is in line with the applicable scenarios for distributed computing (log analysis and indexing are two major application scenarios).
Today we come to actually build Hadoop version 2.2.0, the actual combat environment for the current mainstream server operating system CentOS 5.8 system.
First, the actual combat environment
System Version: CentOS 5.8x86_64
JAVA version: JDK-1.7.0_25
Hadoop version: hadoop-2.2.0
192.168.149.128namenode (namenode, secondary namenode and ResourceManager roles)
192.168.149.129datanode1 (act as datanode, nodemanager role)
192.168.149.130datanode2 (act as datanode, nodemanager role)
Second, the system preparation
1, Hadoop can download the latest version of Hadoop2.2 directly from the Apache official website. The official is currently provided linux32-bit system executables, so if you need to deploy on a 64-bit system you need to download a separate source src source code. (If it is a real online environment, please download 64-bit hadoop version, so you can avoid a lot of problems, where I experiment with a 32-bit version)
1234 Hadoop download address
http://apache.claz.org/hadoop/common/hadoop-2.2.0/
Java download download
http://www.Oracle.com/technetwork/java/javase/downloads/index.html
2, here we use three CnetOS server to build Hadoop cluster, the respective roles as indicated above.
The first step: we need to set the corresponding host name of the three servers / etc / hosts as follows (the real environment can use intranet DNS resolution)
[root @ node1 hadoop] # cat / etc / hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1localhost.localdomain localhost
192.168.149.128node1
192.168.149.129node2
192.168.149.130node3
(Note * We need to configure the hosts in the namenode, datanode three servers)
Step two: no password from namenode landing datanode server, need to do the following configuration:
Execute ssh-keygen on namenode 128, press Enter to return.
Then copy the public key / root / .ssh / id_rsa.pub to the datanode server, the copy method is as follows:
ssh-copy-id -i .ssh / id_rsa.pub root@192.168.149.129
ssh-copy-id -i .ssh / id_rsa.pub root@192.168.149.130
Third, Java installation and configuration
tar-xvzf jdk-7u25-linux-x64.tar.gz && mkdir-p / usr / java /; mv / jdk1.7.0_25 / usr / java /.
Install and configure java environment variables, add the following code at the end of / etc / profile:
export JAVA_HOME = / usr / java / jdk1.7.0_25 /
export PATH = $ JAVA_HOME / bin: $ PATH
export CLASSPATH = $ JAVE_HOME / lib / dt.jar: $ JAVE_HOME / lib / tools.jar: ./
Save and exit, and then execute source / etc / profile to take effect. The java-version command line implementation on behalf of JAVA installed successfully.
[root @ node1 ~] # java-version
java version "1.7.0_25"
Java (TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot ™ 64-Bit Server VM (build 23.25-b01, mixed mode)
(Note * We need to install the Java JDK version on namenode, datanode, all three servers)
Fourth, Hadoop version installed
Hadoop2.2.0 official download version, without decompression decompile installation can be used, as follows:
The first step to extract:
tar -xzvf hadoop-2.2.0.tar.gz && mv hadoop-2.2.0 / data / hadoop /
(Note * First installed on the namenode server hadoop version, datanode do not have to install, will be modified after the installation of unified installation datanode)
The second step configuration variables:
Continue to add the following code at the end of / etc / profile and execute source / etc / profile to take effect.
export HADOOP_HOME = / data / hadoop /
export PATH = $ PATH: $ HADOOP_HOME / bin /
export JAVA_LIBRARY_PATH = / data / hadoop / lib / native /
(Note * We need to configure Hadoop related variables on namenode, datanode, and three servers)
Fifth, configure Hadoop
In namenode configuration, we need to modify the following places:
1, modify the vi /data/hadoop/etc/hadoop/core-site.xml as follows:
<? xml version = "1.0"?>
<? xml-stylesheet type = "text / xsl" href = \ '# \' "Put site-specific property overrides inthisfile. ->
<configuration>
<property>
<name> fs.default.name </ name>
<value> hdfs: //192.168.149.128: 9000 </ value>
</ property>
<property>
<name> hadoop.tmp.dir </ name>
<value> / tmp / hadoop - $ {user.name} </ value>
<description> A base forother temporary directories. </ description>
</ property>
</ configuration>
2, modify the vi /data/hadoop/etc/hadoop/mapred-site.xml as follows:
<? xml version = "1.0"?>
<? xml-stylesheet type = "text / xsl" href = \ '# \' "Put site-specific property overrides inthisfile. ->
<configuration>
<property>
<name> mapred.job.tracker </ name>
<value> 192.168.149.128:9001 </ value>
</ property>
</ configuration>
3, modify the vi /data/hadoop/etc/hadoop/hdfs-site.xml as follows:
<? xml version = "1.0" encoding = "UTF-8"?>
<? xml-stylesheet type = "text / xsl" href = \ '# \' "/ name>
<value> / data / hadoop / data_name1, / data / hadoop / data_name2 </ value>
</ property>
<property>
<name> dfs.data.dir </ name>
<value> / data / hadoop / data_1, / data / hadoop / data_2 </ value>
</ property>
<property>
<name> dfs.replication </ name>
<value> 2 </ value>
</ property>
</ configuration>
4, add JAVAHOME variable at the end of /data/hadoop/etc/hadoop/hadoop-env.sh file:
echo "export JAVA_HOME = / usr / java / jdk1.7.0_25 /" >> /data/hadoop/etc/hadoop/hadoop-env.sh
5, modify the vi / data / hadoop / etc / hadoop / masters file as follows:
192.168.149.128
6, modify the vi / data / hadoop / etc / hadoop / slaves file as follows:
192.168.149.129
192.168.149.130
As configured above, the specific meaning of the above configuration do not do too much explanation here, do not understand when building, you can check the relevant official documents.
As above namenode basic build is completed, then we need to deploy datanode, deployment datanode relatively simple, you can perform the following operation.
1 fori in`seq 129130`; doscp -r / data / hadoop / root@192.168.149.$i: / data /; done
Since then the entire cluster basic build is completed, the next step is to start hadoop cluster.