Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details
Reprinted please indicate the source: http://blog.csdn.net/tang9140/article/details/42869531
I recently learned how to install hadoop. The steps below are described in detail
I. Environment
I installed it in Linux. For students who want to learn on windows, they can use virtual machines or use cygwin to simulate the linux environment.
There are now three servers allocated as follows:
10.0.1.100 NameNode
10.0.1.201 DataNode1
10.0.1.202 DataNode2
NameNode can be viewed as a manager in a distributed file system. It is mainly responsible for managing the file system namespace, cluster configuration information, and storage block replication.
DataNode (slave server) is the basic unit of file storage. It stores the Block in the local file system and stores the Block's Meta-data, at the same time, all existing Block information is periodically sent to NameNode
1. Install jdk
Find the appropriate jdk version on the internet, and I downloaded the jdk-6u23-linux-i586.bin (32-bit linux version ). Upload the files to the/usr/local/java/jdk directory of the three servers respectively (this directory can be modified according to your habits)
The jdk-6u23-linux-i586.bin is a self-extracting file that requires increased executable permissions. As follows:
chmod +x jdk-6u23-linux-i586.bin
./jdk-6u23-linux-i586.bin
Press enter to decompress the package.
Modify configurations
vi /etc/profile
Add the following environment variable configurations at the end of profile:
JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jarPATH=$JAVA_HOME/bin:$PATHexport JAVA_HOME CLASSPATH PATH
Reconnect to the server through ssh and test whether JDK is installed successfully. The command is as follows:
java -version
The normal display is as follows:
Java version "1.6.0 _ 23"
Java (TM) SE Runtime Environment (build 1.6.0 _ 23-b05)
Java HotSpot (TM) Client VM (build 19.0-b09, mixed mode, sharing)
2. Configure ssh
Generally, the ssh service is installed on your own in linux. If you have not installed the ssh service, install it on your own. The installation is not detailed here.
We need to configure the ssh service to run the hadoop environment for ssh password-less login. That is, the NameNode node must be able to log on to the DataNode node without a password through ssh.
Enter the NameNode server and enter the following command:
[root@localhost hadoop]# cd ~[root@localhost ~]# cd .ssh/[root@localhost .ssh]# ssh-keygen -t rsa
Press enter .. There are two more files in the ssh directory.
Private Key File: id_rsa
Public Key File: id_rsa.pub
Copy the id_rsa.pub file as authorized_keys.
[root@localhost .ssh]# cp id_rsa.pub authorized_keys
Distribute the public key file authorized_keys to each DataNode node:
[root@localhost .ssh]# scp authorized_keys root@10.0.1.201:/root/.ssh/[root@localhost .ssh]# scp authorized_keys root@10.0.1.202:/root/.ssh/
Note: If the. ssh directory does not exist in the current user directory, you can create one by yourself.
Verify ssh logon without a password:
[root@localhost .ssh]# ssh root@10.0.1.201Last login: Mon Jan 5 09:46:01 2015 from 10.0.1.100
If the preceding information is displayed, the configuration is successful! If you are prompted to enter the password, the configuration fails.
Ii. Download and install Hadoop1. Download
Go to the hadoop official website (http://hadoop.apache.org/) to download the appropriate hadoop version. I chose a newer version 2.5.2 (the latest version is 2.6.0 ). The file name is hadoop-2.5.2.tar.gz. download the file and upload it to/root/test (all three servers must be uploaded). Switch to the directory and decompress the file:
tar -zvxf hadoop-2.5.2.tar.gz
2. Configuration
The master server (10.0.1.100) enters the configuration Directory: cd hadoop-2.5.2/etc/hadoop
Core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://10.0.1.100:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property></configuration>
Note: The two slave servers also need to modify the core-site.xml profile above, and the remaining configurations below are only for the master server
Hdfs-site.xml
<configuration> <property> <name>dfs.nameservices</name> <value>hadoop-cluster1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>10.0.1.100:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property></configuration>
Mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>10.0.1.100:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>10.0.1.100:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>10.0.1.100:19888</value> </property></configuration>
Yarn-site.xml
<configuration><!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>10.0.1.100:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>10.0.1.100:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>10.0.1.100:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>10.0.1.100:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>10.0.1.100:8088</value> </property></configuration>
Slaves
10.0.1.20110.0.1.202
Hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23
Yarn-env.sh
export JAVA_HOME=/usr/local/java/jdk/jdk1.6.0_23
3. format the File System
bin/hdfs namenode -format
Note: The formatting file system here is not hard disk formatting, but cleaning the dfs. namenode. name. dir and dfs. datanode. data. dir directories of the master server hdfs-site.xml.
4. Start and Stop services
Start
sbin/start-dfs.sh
sbin/start-yarn.sh
Stop
sbin/stop-dfs.sh
sbin/stop-yarn.sh
5. View started processes
jps
Shown below
14140 ResourceManager
13795 NameNode
Jps 14399
3. access through a browser
Http: // 10.0.1.100: 50070/
Http: // 10.0.1.100: 8088/
---------------
Note: The Server Load balancer file of the master server is configured with ip addresses. In this case, you need to add the ip address to the host name ing in/etc/hosts of the master server as follows:
10.0.1.201 anyname110.0.1.202 anyname2
Otherwise, the following error log may be printed from the DateNode node of the server when you execute the start-dfs.sh command:
17:06:54, 375 ERROR org. apache. hadoop. hdfs. server. datanode. dataNode: Initialization failed for Block pool BP-1748412339-10.0.1.212-1420015637155 (Datanode Uuid null) service to/using: 9000 Datanode denied communication with namenode because hostname cannot be resolved (ip = 10.0.1.217, hostname = 10.0.1.217 ): datanodeRegistration (0.0.0.0, datanodeUuid = bytes, infoPort = 50075, ipcPort = 50020, storageInfo = lv =-55; cid = CID-4237dee9-ea5e-4994-91c2-008d9e804960; nsid = 358861143; c = 0)
You cannot resolve the IP address to the host name, that is, the host name cannot be obtained. You must specify the IP address in/etc/hosts.
This article references http://blog.csdn.net/greensurfer/article/details/39450369