This article was originally written at the end of last November and was troubled by a variety of complicated things. Moreover, there were a lot of well-configured articles on the Internet, but it was still not clear. Still write a Step By Step tutorial for your use.
The Hadoop deployment environment is the virtualized four hosts, and the OS is Ubuntu Server10.04. (XenServer5.6 compatible OS does not include Ubuntu. Converting Ubuntu to PV is also a tough process. This article also introduces ). The version of Hadoop is 0.20.2. Install the Java environment as shown in the previous section.
The host name and its IP address correspond to the following:
Slave & TaskTracker: dm1, IP: 192.168.0.17; (datanode)
Slave & TaskTracker: dm2, IP: 192.168.0.18; (datanode)
Slave & TaskTracker: dm3, IP: 192.168.0.9; (datanode)
Master & JobTracker: dm4, IP: 192.168.0.10; (namenode)
The Master is the management node of the Hadoop cluster, and important configuration work is on it. For its functions and functions, see HadoopAPI.
The configuration steps are as follows:
I. Modify the HostName of each node (dm1-dm4), the command is as follows:
?
For example:
II. Add the host Name and IP address to the host for communication. The Master must know all slave information. The corresponding slave only needs to know the Master and its own information.
The command is as follows:
?
The hosts configuration of the Master (dm4) should be shown in:
Other slave (dm3 ......) The hosts configuration of is shown in:
3.. Hadoop core code needs to configure the core-site.xml, hdfs-site.xml, mapread-site.xml, mapred-site.xml, hadoop-env.sh In the conf folder. For more information about the configurations, see the Hadoop help documentation.
1. First edit the core-site.xml file for each machine node (including master and slave), the command is as follows: (put the Hadoop folder under home)
?
| 1 |
vi /home/hadoop/conf/core-site.xml |
The core-site.xml file configuration should be shown in the following code:
?
| 1234567891011 |
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration><property> <name>fs.default.name</name> <value>hdfs://dm4:9000</value> </property></configuration> |
2. Second, edit the hdfs-site.xml for each machine node (including master and slave), the command is as follows:
?
| 1 |
vi /home/hadoop/conf/hdfs-site.xml |
The hdfs-site.xml file configuration should be shown in the following code:
?
| 12345678910 |
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/NameData</value> </property> |
?
| 12345678910 |
<property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property></configuration> |
3. Again, edit the mapred-site.xml file for each machine node (including master and slave) with the following command:
?
| 1 |
vi /home/hadoop/conf/mapred-site.xml |
The mapred-site.xml file configuration should be shown in the following code:
?
| 1234567891011 |
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration><property><name>mapred.job.tracker</name><value>192.168.0.10:9001</value></property></configuration> |
4. Finally, edit the hadoop-env.sh files for each machine node (including master and slave) with the command below:
?
| 1 |
vi /home/hadoop/conf/hadoop-env.sh |
Add several lines of code to the file, as shown below:
?
| 123 |
export HADOOP_HOME=/home/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/conf export PATH=$PATH:$HADOOP_HOME/bin |
Thu. Configure the master-slave relationship of the cluster. On all machine nodes, Hadoop's conf folder contains two files: slaves and masters. Add the IP address or hostname of the Master (dm4) to the masters. Add the IP or hostname of the Slave (dm1-dm3) to the slaves file. All nodes must be modified.
Shows Masters:
Shows slaves:
Now, the overall installation and configuration of Hadoop have been completed. The start of the Hadoop cluster starts from the Master (Namenode) machine. It communicates with the slave (DataNode) Using ssh. Next we need to set ssh password-less public key authentication for login.
V.For more information about the principles of. SSH asymmetric keys, see this document. FirstAll nodesTo generate a key pair, follow these steps:
1.All nodesTo generate an RSA key pair, run the following command:
?
As shown in:
Press enter to store the key pair as/root/. ssh/id_rsa. Generate/root/viki. pub in the demonstration in this article and then ask you to enter the password and select null
The final generation is as follows:
2. SetMaster (Namenode)The contents of the generated public key viki. pub are copied to the/root/. ssh/authorized_keys file on the local machine. The command is as follows:
?
| 1 |
cp viki.pub authorized_keys |
Then, copy the authorized_keys file to eachSlave (DataNode)The/root/. ssh/folder of the machine. The command is as follows:
?
| 1 |
scp /root/.ssh/authorized_keys dm3:/root/.ssh/ |
FinallyAll machinesRun the user permission command chmod as follows:
?
| 1 |
chmod 644 authorized_keys |
After the above steps, the ssh configuration is complete. Verify with the following command:
?
| 123456 |
ssh dm3exitssh dm2exitssh dm1exit |
The password is required for the first connection. Enter "yes" and "Machine Password. You do not need to enter it later.
Sat.. Start and verify the Hadoop cluster, as described above. Input: http: // 192.168.0.10: 50030/jobtracker. jsp
The following Hadoop cluster is created:
VII. References
1. Hadoop Quickstart http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html
2. common thread: OpenSSH Key Management
Http://www.ibm.com/developerworks/cn/linux/security/openssh/part1/index.html
Address: http://www.cnblogs.com/ventlam/archive/2011/01/21/hadoopcluster.html