Installation and configuration of virtual machine Ubuntu under Hadoop2.6.1 (fully distributed)

Last Update:2018-07-26 Source: Internet

Author: User

Tags tmp folder ssh

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is written on the basis of the previous article, the previous article has explained in detail how to configure a single-machine pseudo-distributed Hadoop, this article focuses on a fully distributed configuration. This hadoop configuration is mainly reference to the official website and some tutorials on the network to summarize, the first build, if there are errors, thank you for pointing.

Pseudo-distributed: http://blog.csdn.net/yuzhuzhong/article/details/49922845

The cluster environment used in this tutorial: two virtual machines, one as Master, LAN IP 192.168.9.131, and the other as the Slave, the LAN IP is 192.168.9.133. Java environment installation configuration, Hadoop installation, SSH installation, etc. have been described in the previous article, this will no longer be described.

First, complete the pseudo-distributed installation on a virtual machine

Installation and configuration of virtual machine Ubuntu under Hadoop2.6.1 (Pseudo-distributed)

Second, cloning to get the same configuration

1. Shut down the client before cloning

2. On the left side of the image below, select the virtual machine you want to clone. 3. Clone type Select "Create Full clone"

4. Select all other default

Third, network configuration

1. The hadoop:/usr/hadoop/hadoop-2.6.1/sbin/stop-dfs.sh must be closed before cluster configuration

2. There are two virtual machines, with IP addresses of 192.168.9131 and 192.168.9133 respectively. Select one of the virtual machines as master (as I chose 192.168.9.131), and then modify the machine name to master in the/etc/hostname file and the other to Slave1.

3. In the/etc/hosts file, the host information of all the clusters is written in, as shown in the figure

There can only be one 127.0.0.1, which corresponds to localhost, otherwise an error occurs.

4. Note that the network configuration needs to be performed on all hosts

As mentioned above is the master host configuration, and on other Slave host, also to the/etc/hostname (modified to Slave1, SLAVE2, etc.) and/etc/hosts (generally with the configuration on master) These two files are modified accordingly.

5. It's a good idea to restart.

6. Do not forget to change the name of the virtual machine to correspond (as if it does not affect the use, just to distinguish a little)

Iv. Ping connectivity between virtual machines

I was on the same physical machine built two virtual machines, using NAT network, IP address in the same network segment, so can ping pass. As shown in figure

The ping process persists and can be stopped using the "Ctrl" + "C" key.

Five, SSH login node without password

This operation allows master to log on directly to the SSH node without a password.

Since the previous pseudo-distributed configuration is capable of no password localhost login, you can directly use the command: SSH Slave1 If there is an error, you can directly select "Yes" to let it install itself. Two times to succeed.

VI. cluster configuration (distributed environment)

This time you need to modify five files in Hadoophome/etc/hadoop (note the same changes in all hosts)

1.slave file

Delete the original file localhost, and write the hostname of all slave, one per line. For example, I have only one slave node, so there is only a single line of content in the file: Slave1.

2.core-site.xml file

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp</value>
<description>abase for other temporary directories.</description>
</property>
</configuration>

3.hdfs-site.xml file because there is only one slave, so the value of Dfs.replication is set to 1.

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/hadoop-2.6.1/tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

4.mapred-site.xml file

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

5.yarn-site.xml file

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Where the red callout part needs to be modified according to your own installation directory.

6. Switching to Hadoop mode should delete previous temporary files (for others to learn from)

Switching the mode of Hadoop, whether from a cluster to a pseudo-distributed, or from a pseudo-distributed to a cluster, if you encounter a situation that does not start properly, you can delete the temporary folders of the nodes involved, so that although the previous data will be deleted, it will ensure that the cluster starts correctly. Alternatively, you can set a different temporary folder (not verified) for cluster mode and pseudo-distributed mode. So if the cluster can be started before, but not boot, especially DataNode can not start, you may want to try to delete all nodes (including Slave node) on the TMP folder, re-execute the Bin/hdfs Namenode-format, start again try again.

Vii. start-up and verification

1. Start Hadoop on the master node

First you have to go to the installation directory: cd/usr/hadoop/hadoop-2.6.1

The first run needs to be formatted: Bin/hdfs Namenode-format

Start: sbin/start-dfs.sh

sbin/start-yarn.sh

After successful startup, you can use the JPS command to view each node's

You can see that the master node started the Namenode, Secondrrynamenode, ResourceManager processes.

The slave node initiates the Datanode and NodeManager processes.

The shutdown of the Hadoop cluster is also performed on the master node (hadoophome directory):

sbin/stop-dfs.sh

sbin/stop-yarn.sh

2. Verify that the address can be opened in master or Salve node Browser:/HTTP/master:50090

Analyze startup failure by viewing the start log reason

Sometimes the Hadoop cluster does not start correctly, such as the NameNode process on Master does not start successfully, you can view the boot log to troubleshoot the cause, but the novice may need to be aware of several points: "Master:starting NameNode on startup, logging To/usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out ", but actually the boot log information is recorded in/usr/local/hadoop/logs/ Hadoop-hadoop-namenode-master.log, each time the boot log is appended to the log file, so you have to pull to the last look, this look at the record time to know. The usual error hints are in the last, where the error or Java exception is written.

You can also see the status of viewing Datanode and Namenode through a Web page, http://master:50070/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More