Hadoop Study Notes - Production Environment Hadoop Large Cluster Configuration Installation
Installation Environment
Platform: vmware2
Operating System: Oracle Enterprise Linux 5.6
Software Version: hadoop-0.22.0, jdk-6u18
Cluster architecture: 3+ node, master node (hotel01), slave node (hotel02, hotel03 ...)
CPU name
IP
system version
Hadoop node
hadoop process name
hotel01
192.168.2.111
OEL5.6
master
http://www.aliyun.com/zixun/aggregation/11696.html">namenode,jobtracker
hotel02
192.168.2.112
OEL5.6
slave
datanode, tasktracker
hotel03
192.168.2.113
OEL5.6
slave
datanode, tasktracker
...
NOTE: Currently, there are only three hadoop test hosts, but in a real hadoop cluster production environment, there may be hundreds or more hosts. Therefore, the following installation steps should be installed as much as possible in the context of a large hadoop cluster environment to reduce The separate operation on each server, because each operation can be a huge project.
installation steps
1. Download Hadoop and jdk:
http://mirror.bit.edu.cn/apache/hadoop/common/
For example: hadoop-0.22.0
2. Configure the DNS resolution host name
NOTE: In a production Hadoop cluster environment, configuring the / etc / host method by configuring the DNS mapping machine name can avoid configuring each host's own host file on each node because the server may have many stations. You do not need to modify the hostname / IP mapping file for each node / etc / host when adding a node. Reduce the configuration steps and time, easy to manage.
See the detailed steps:
[Hadoop study notes-DNS configuration] http://www.linuxidc.com/Linux/2014-02/96519.htm
Configuration instructions: NDS server on the hotel01 (master) node on the hotel01, hotel02, hotel03 node host name resolution.
3. Set up hadoop running account
Create a hadoop run account on all nodes
[root @ gc ~] # groupadd hadoop
[root @ gc ~] # useradd -g hadoop grid - Note that you must specify the group here, or may not establish mutual trust
[root @ gc ~] # idgrid
uid = 501 (grid) gid = 54326 (hadoop) groups = 54326 (hadoop)
[root @ gc ~] # passwd grid
Changingpassword for user grid.
New UNIXpassword:
BAD PASSWORD: itis too short
Retype new UNIXpassword:
passwd: allauthentication tokens updated successfully.
Description: In a large hadoop cluster installation environment, this step can be completed before installing the linux system in bulk, and then the system replication. (Not tried, it is said ghost tools should be able to achieve software)
4. Configure ssh password-free connection via NFS
NOTE: When ssh password-free connection is configured via NFS, when we have a new node access, it is no longer necessary to separately add own public key information to each other node, and only needs to append the public key information to the shared authorized_keys public key , Other nodes point directly to the latest public key file. Easy to assign public key and management.
See the detailed steps:
[Hadoop study notes -NFS configuration] Http://www.linuxidc.com/Linux/2014-02/96520.htm
5. Extracted hadoop installation package
- Can be a node to extract the configuration file
[grid @ hotel01 ~] $ ll
Total 43580
-rw-r - r-- 1 grid hadoop 445755682012-11-19 hadoop-0.20.2.tar.gz
[grid @ hotel01 ~] $ tar xzvf /home/grid/hadoop-0.20.2.tar.gz
[grid @ hotel01 ~] $ ll
Total 43584
drwxr-xr-x 12 grid hadoop 4096 2010-02-19hadoop-0.20.2
-rw-r - r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz
- Install jdk on each node
[root @ hotel01 ~] #. / jdk-6u18-linux-x64-rpm.bin
6. Hadoop configuration related documents
◆ Configure hadoop-env.sh
[root @ gc conf] #pwd
/root/hadoop-0.20.2/conf
- Modify jdk installation path
[root @ gc conf] vihadoop-env.sh
export JAVA_HOME = / usr / java / jdk1.6.0_18
◆ configure namenode, modify the site file
- Modify the core-site.xml file
[gird @ hotel01conf] # vi core-site.xml
<? xmlversion = "1.0"?>
<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>
<! - Putsite-specific property overrides in this file. ->
<configuration>
<property>
<name> fs.default.name </ name>
<value> hdfs: //hotel01.licz.com: 9000 </ value> # Fully distributed can not use localhost, use the master node IP or machine name.
</ property>
<property>
<name> hadoop.tmp.dir </ name>
<value> / home / grid / hadoop / tmp </ value>
</ property>
</ configuration>
Note: fs.default.nameNameNode's IP address and port
- Modify the hdfs-site.xml file
[grid@hotel01hadoop-0.20.2] $ mkdir data
[gird @ hotel01conf] # vi hdfs-site.xml
<? xmlversion = "1.0"?>
<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>
<! - Putsite-specific property overrides in this file. ->
<configuration>
<property>
<name> dfs.data.dir </ name>
<value> /home/grid/hadoop-0.20.2/data </ value> - Note that this directory must have been created and can be read and written
</ property>
<property>
<name> dfs.replication </ name>
<value> 2 </ value>
</ property>
</ configuration>
Common configuration parameters in the hdfs-site.xml file:
- Modify the mapred-site.xml file
[gird @ hotel01conf] # vi mapred-site.xml
<? xmlversion = "1.0"?>
<? xml-stylesheettype = "text / xsl" href = "configuration.xsl"?>
<! - Putsite-specific property overrides in this file. ->
<configuration>
<property>
<name> mapred.job.tracker </ name>
<value> hotel01.licz.com:9001 </ value>
</ property>
</ configuration>
mapred-site.xml file common configuration parameters
Configure masters and slaves files
[gird @ hotel01conf] $ vi masters
hotel01.licz.com
[gird @ hotel01conf] $ vi slaves
hotel02.licz.com
hotel03.licz.com