Installation and configuration of virtual machine Ubuntu under Hadoop2.6.1 (fully distributed)

Source: Internet
Author: User
Tags tmp folder ssh

This article is written on the basis of the previous article, the previous article has explained in detail how to configure a single-machine pseudo-distributed Hadoop, this article focuses on a fully distributed configuration. This hadoop configuration is mainly reference to the official website and some tutorials on the network to summarize, the first build, if there are errors, thank you for pointing.

Pseudo-distributed: http://blog.csdn.net/yuzhuzhong/article/details/49922845


The cluster environment used in this tutorial: two virtual machines, one as Master, LAN IP 192.168.9.131, and the other as the Slave, the LAN IP is 192.168.9.133. Java environment installation configuration, Hadoop installation, SSH installation, etc. have been described in the previous article, this will no longer be described.


First, complete the pseudo-distributed installation on a virtual machine

Installation and configuration of virtual machine Ubuntu under Hadoop2.6.1 (Pseudo-distributed)


Second, cloning to get the same configuration

1. Shut down the client before cloning

2. On the left side of the image below, select the virtual machine you want to clone. 3. Clone type Select "Create Full clone"

4. Select all other default

Third, network configuration

1. The hadoop:/usr/hadoop/hadoop-2.6.1/sbin/stop-dfs.sh must be closed before cluster configuration

2. There are two virtual machines, with IP addresses of 192.168.9131 and 192.168.9133 respectively. Select one of the virtual machines as master (as I chose 192.168.9.131), and then modify the machine name to master in the/etc/hostname file and the other to Slave1.

3. In the/etc/hosts file, the host information of all the clusters is written in, as shown in the figure

There can only be one 127.0.0.1, which corresponds to localhost, otherwise an error occurs.

4. Note that the network configuration needs to be performed on all hosts

As mentioned above is the master host configuration, and on other Slave host, also to the/etc/hostname (modified to Slave1, SLAVE2, etc.) and/etc/hosts (generally with the configuration on master) These two files are modified accordingly.

5. It's a good idea to restart.

6. Do not forget to change the name of the virtual machine to correspond (as if it does not affect the use, just to distinguish a little)


Iv. Ping connectivity between virtual machines

I was on the same physical machine built two virtual machines, using NAT network, IP address in the same network segment, so can ping pass. As shown in figure

The ping process persists and can be stopped using the "Ctrl" + "C" key.

Five, SSH login node without password

This operation allows master to log on directly to the SSH node without a password.

Since the previous pseudo-distributed configuration is capable of no password localhost login, you can directly use the command: SSH Slave1 If there is an error, you can directly select "Yes" to let it install itself. Two times to succeed.


VI. cluster configuration (distributed environment)

This time you need to modify five files in Hadoophome/etc/hadoop (note the same changes in all hosts)

1.slave file

Delete the original file localhost, and write the hostname of all slave, one per line. For example, I have only one slave node, so there is only a single line of content in the file: Slave1.

2.core-site.xml file

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp</value>
<description>abase for other temporary directories.</description>
</property>
</configuration>

3.hdfs-site.xml file because there is only one slave, so the value of Dfs.replication is set to 1.

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

4.mapred-site.xml file

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

5.yarn-site.xml file

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Where the red callout part needs to be modified according to your own installation directory.

6. Switching to Hadoop mode should delete previous temporary files (for others to learn from)

Switching the mode of Hadoop, whether from a cluster to a pseudo-distributed, or from a pseudo-distributed to a cluster, if you encounter a situation that does not start properly, you can delete the temporary folders of the nodes involved, so that although the previous data will be deleted, it will ensure that the cluster starts correctly. Alternatively, you can set a different temporary folder (not verified) for cluster mode and pseudo-distributed mode. So if the cluster can be started before, but not boot, especially DataNode can not start, you may want to try to delete all nodes (including Slave node) on the TMP folder, re-execute the Bin/hdfs Namenode-format, start again try again.

Vii. start-up and verification

1. Start Hadoop on the master node

First you have to go to the installation directory: cd/usr/hadoop/hadoop-2.6.1

The first run needs to be formatted: Bin/hdfs Namenode-format

Start: sbin/start-dfs.sh

sbin/start-yarn.sh

After successful startup, you can use the JPS command to view each node's

You can see that the master node started the Namenode, Secondrrynamenode, ResourceManager processes.

The slave node initiates the Datanode and NodeManager processes.

The shutdown of the Hadoop cluster is also performed on the master node (hadoophome directory):

sbin/stop-dfs.sh

sbin/stop-yarn.sh

2. Verify that the address can be opened in master or Salve node Browser:/HTTP/master:50090


Analyze startup failure by viewing the start log reason

Sometimes the Hadoop cluster does not start correctly, such as the NameNode process on Master does not start successfully, you can view the boot log to troubleshoot the cause, but the novice may need to be aware of several points: "Master:starting NameNode on startup, logging To/usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out ", but actually the boot log information is recorded in/usr/local/hadoop/logs/ Hadoop-hadoop-namenode-master.log, each time the boot log is appended to the log file, so you have to pull to the last look, this look at the record time to know. The usual error hints are in the last, where the error or Java exception is written.

You can also see the status of viewing Datanode and Namenode through a Web page, http://master:50070/


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.