Environment:
Operating system: CentOS 6.5 64bit
Hadoop:version 1.2.1
Servers:hadoopnamenode,hadoop2ndnamenode,hadoopdatanode1,hadoopdatanode2
Note: For the sake of convenience, on these 4 servers, I use the root account directly to operate
Download and environment variable settings:
On all 4 servers:
Download hadoop-1.2.1-bin.tar.gz from Apache website, unzip and place in a directory, I put under/usr/local, for the sake of convenience, I renamed the directory hadoop-1.2.1 to Hadoop.
Modify the. BASHRC and add the following environment variables:
Export Hadoop_prefix=/usr/local/hadoop
Export path= $PATH: $HADOOP _prefix/bin
Configure the hosts File:
Add in/etc/hosts:
153.65.170.11 Hadoopnamenode
153.65.170.45 Hadoop2ndnamenode
153.65.171.174 Hadoopdatanode1
153.65.171.24 Hadoopdatanode2
Configuring SSH :
Execute on Hadoopnamenode:
ssh-keygen//generating public and private keys
Ssh-copy-id–i ~/.ssh/id_rsa.pub [Email protected]//Copy the Hadoopnamenode's public key to three other servers
Ssh-copy-id–i ~/.ssh/id_rsa.pub [email protected]
Ssh-copy-id–i ~/.ssh/id_rsa.pub [email protected]
The purpose of this is to SSH from Hadoopnamenode to the other three servers without requiring a password. After Ssh-copy-id, the public key is actually added to the other three server ~/.ssh/authorized_keys files.
For example, to log in to Hadoop2ndnamenode from Hadoopnamenode, the process is probably: Hadoop2ndnamenode sends a random string to Hadoopnamenode, and Hadoopnamenode encrypts it with its own private key and send it back again. Hadoop2ndnamenode is decrypted with the pre-stored Hadoopnamenode public key and, if successful, proves that the user is trustworthy, allowing login to the shell directly and no longer requires a password.
Configure Hadoop :
In general, we will use the following command to start HDFs (that is, name node, secondary name node, data node) and MapReduce:
/usr/local/hadoop/bin/start-dfs.sh
/usr/local/hadoop/bin/start-mapred.sh
The start-dfs.sh process is probably the same:
- The machine that executes the command automatically becomes name node (and job Tracker),
- Start all the machines listed in/usr/local/hadoop/conf/slaves and act as Data node (and task Tracker).
- Start all the machines listed in/usr/local/hadoop/conf/masters and act as secondary name node.
The process of start-mapred.sh is similar:
- The machine that executes the command automatically becomes the job tracker,
- Start all the machines listed in/usr/local/hadoop/conf/slaves and serve as task tracker
Note:The conf/masters file is often confusing and intuitively feels that it is used for configuration name node, but because (1) the name node is not required to be specifically configured into Conf/masters, Only secondary name node needs to be configured.
According to the above description, we can start to modify the Masters file on Hadoopnamenode, delete the original content, add a line:
Hadoop2ndnamenode
Modify the slaves file on the Hadoopnamenode, delete the original content, add two lines:
Hadoopdatanode1
Hadoopdatanode2
In addition, configuration is required on hadoopdatanode1 and hadoopdatanode2 so that data node knows the name Node,task tracker know Job tracker. So modify the Conf/core-site.xml on Hadoopdatanode1 and Hadoopdatanode2, respectively:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnamenode:10001</value>
</property>
</configuration>
and Conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoopnamenode:10002</value>
</property>
</configuration>
Format name Node :
Execute on Hadoopnamenode:
Hadoop Namenode-format
start Hadoop :
First, execute the following command on Hadoopnamenode to start all name node, secondary name node, data node:
start-dfs.sh
You can use the JPS command to view the currently running Java processes on 4 servers, as you would normally see:
Hadoopnamenode on a process: NameNode
Hadoop2ndnamenode on a process: Secondarynamenode
Hadoopdatanode1/hadoopdatanode2 on a process: DataNode
Second, execute the following command on Hadoopnamenode to start all Job Tracker, Task tracker:
start-mapred.sh
Continue to use the JPS command to view the currently running Java processes on 4 servers, as you will see in normal situations:
Hadoopnamenode on the process: NameNode, Jobtracker
Hadoop2ndnamenode on a process: Secondarynamenode
Hadoopdatanode1/hadoopdatanode2 on the process: DataNode, Tasktracker
Turn off Hadoop :
On the Hadoopnamenode:
stop-mapred.sh
stop-dfs.sh
Other:
Name Node Admin interface: http://hadoopnamenode:50070/
JOB Tracker Management Interface: http://hadoopnamenode:50030/
Configuration example for a 4-node Hadoop cluster