This article is written on the basis of the previous article, the previous article has explained in detail how to configure a single-machine pseudo-distributed Hadoop, this article focuses on a fully distributed configuration. This hadoop configuration is mainly reference to the official website and some tutorials on the network to summarize, the first build, if there are errors, thank you for pointing.
Pseudo-distributed: http://blog.csdn.net/yuzhuzhong/article/details/49922845
The cluster environment used in this tutorial: two virtual machines, one as Master, LAN IP 192.168.9.131, and the other as the Slave, the LAN IP is 192.168.9.133. Java environment installation configuration, Hadoop installation, SSH installation, etc. have been described in the previous article, this will no longer be described.
First, complete the pseudo-distributed installation on a virtual machine
Installation and configuration of virtual machine Ubuntu under Hadoop2.6.1 (Pseudo-distributed)
Second, cloning to get the same configuration
1. Shut down the client before cloning
2. On the left side of the image below, select the virtual machine you want to clone. 3. Clone type Select "Create Full clone"
4. Select all other default
Third, network configuration
1. The hadoop:/usr/hadoop/hadoop-2.6.1/sbin/stop-dfs.sh must be closed before cluster configuration
2. There are two virtual machines, with IP addresses of 192.168.9131 and 192.168.9133 respectively. Select one of the virtual machines as master (as I chose 192.168.9.131), and then modify the machine name to master in the/etc/hostname file and the other to Slave1.
3. In the/etc/hosts file, the host information of all the clusters is written in, as shown in the figure
There can only be one 127.0.0.1, which corresponds to localhost, otherwise an error occurs.
4. Note that the network configuration needs to be performed on all hosts
As mentioned above is the master host configuration, and on other Slave host, also to the/etc/hostname (modified to Slave1, SLAVE2, etc.) and/etc/hosts (generally with the configuration on master) These two files are modified accordingly.
5. It's a good idea to restart.
6. Do not forget to change the name of the virtual machine to correspond (as if it does not affect the use, just to distinguish a little)
Iv. Ping connectivity between virtual machines
I was on the same physical machine built two virtual machines, using NAT network, IP address in the same network segment, so can ping pass. As shown in figure
The ping process persists and can be stopped using the "Ctrl" + "C" key.
Five, SSH login node without password
This operation allows master to log on directly to the SSH node without a password.
Since the previous pseudo-distributed configuration is capable of no password localhost login, you can directly use the command: SSH Slave1 If there is an error, you can directly select "Yes" to let it install itself. Two times to succeed.
VI. cluster configuration (distributed environment)
This time you need to modify five files in Hadoophome/etc/hadoop (note the same changes in all hosts)
1.slave file
Delete the original file localhost, and write the hostname of all slave, one per line. For example, I have only one slave node, so there is only a single line of content in the file: Slave1.
2.core-site.xml file
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp</value>
<description>abase for other temporary directories.</description>
</property>
</configuration>
3.hdfs-site.xml file because there is only one slave, so the value of Dfs.replication is set to 1.
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4.mapred-site.xml file
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5.yarn-site.xml file
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Where the red callout part needs to be modified according to your own installation directory.
6. Switching to Hadoop mode should delete previous temporary files (for others to learn from)
Switching the mode of Hadoop, whether from a cluster to a pseudo-distributed, or from a pseudo-distributed to a cluster, if you encounter a situation that does not start properly, you can delete the temporary folders of the nodes involved, so that although the previous data will be deleted, it will ensure that the cluster starts correctly. Alternatively, you can set a different temporary folder (not verified) for cluster mode and pseudo-distributed mode. So if the cluster can be started before, but not boot, especially DataNode can not start, you may want to try to delete all nodes (including Slave node) on the TMP folder, re-execute the Bin/hdfs Namenode-format, start again try again.
Vii. start-up and verification
1. Start Hadoop on the master node
First you have to go to the installation directory: cd/usr/hadoop/hadoop-2.6.1
The first run needs to be formatted: Bin/hdfs Namenode-format
Start: sbin/start-dfs.sh
sbin/start-yarn.sh
After successful startup, you can use the JPS command to view each node's
You can see that the master node started the Namenode, Secondrrynamenode, ResourceManager processes.
The slave node initiates the Datanode and NodeManager processes.
The shutdown of the Hadoop cluster is also performed on the master node (hadoophome directory):
sbin/stop-dfs.sh
sbin/stop-yarn.sh
2. Verify that the address can be opened in master or Salve node Browser:/HTTP/master:50090
Analyze startup failure by viewing the start log reason
Sometimes the Hadoop cluster does not start correctly, such as the NameNode process on Master does not start successfully, you can view the boot log to troubleshoot the cause, but the novice may need to be aware of several points: "Master:starting NameNode on startup, logging To/usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out ", but actually the boot log information is recorded in/usr/local/hadoop/logs/ Hadoop-hadoop-namenode-master.log, each time the boot log is appended to the log file, so you have to pull to the last look, this look at the record time to know. The usual error hints are in the last, where the error or Java exception is written.
You can also see the status of viewing Datanode and Namenode through a Web page, http://master:50070/