Ubuntu builds Hadoop's pit Tour (iii)

Source: Internet
Author: User
Tags tmp folder

The previous two articles described how to start from 0 to build a process with the JDK with the Ubuntu, originally this article is intended to introduce the construction of pseudo-distributed cluster. But then think about it anyway pseudo-distributed and completely distributed almost, fortunately directly introduced completely distributed.

If you want to build your own pseudo-distributed play, refer to: Install Ubuntu under VMware and deploy Hadoop1.2.1 distributed environment-CSDN Blog

This article mainly refer to this post: Hadoop2.6.0 installation-cluster (not in the process of building, you can go to the original blog to see)

First, the required environment and software: (Below is our environment, for reference only)

1. Operating system: Windows 10 64-bit

2. Memory: 4G or more (4G can be built, but the operation of the virtual machine may be slow, this situation can be considered dual system)

3. VMware Workstation 12:vmware-workstation-full-12.5.7-5813279.exe

4. VMware Tools: Installing through VMware

5. Ubuntu12.04:ubuntu-14.04.5-desktop-amd64.iso,ubuntu-16.04.3-desktop-amd64.iso (both systems in the team have been successful, but the higher version is relatively smooth)

6. SSH: Install via Linux command

7. jdk1.8:jdk-8u11-linux-x64.tar.gz

8. hadoop2.6.0:hadoop-2.6.0.tar.gz

Second, the building of the cluster (three machines for example, a master (host), two slave (slave), in the virtual machine settings will be network adapter to bridge)

1. In order to make the machines interconnected, we need to modify the/etc/hosts file.

First we need to know the IP address of each machine: Ifconfig to see

Use the ping command to test if you can connect to other machines

Ping IP

Use CTRL + C to stop

Knowing the IP of each host can then go to modify the Hosts file (each host to do the same configuration), sudo gedit/etc/hosts

Modify it to look like the following

192.168.31.61 Master

192.168.31.29 Slave1

192.168.31.34 Slave2

After you modify it, you can use ping Slave1 to test whether the link is possible.

2. Configure SSH Login

SSH is configured to enable no password login between each machine via SSH

To install SSH:

Apt-get Install SSH

Install the SSH public key as well.

$CD ~/.ssh # If you do not have this directory, first execute SSH localhost once, generate the. SSH directory
$rm./id_rsa* # Delete the previously generated public key (if any)
$SSH-keygen-t RSA # always press ENTER to
$cat./id_rsa.pub >>/authorized_keys #将公钥加入授权

Upload the public key generated on master to each slave node

$SCP ~/.ssh/id_rsa.pub [Email protected]:/home/hadoop/
$SCP ~/.ssh/id_rsa.pub [Email protected]:/home/hadoop/

Hadoop here refers to the user name, if the machine name in your cluster is not the same, it will be changed directly to the corresponding user name on the line.

After master transmits the public key to the individual nodes, Slave will add the public key to the authorization so that master can log on to each machine via SSH password-free.

$CD ~/.ssh

$cat/home/hadoop/id_rsa.pub >>/authorized_keys #将Master的公钥加入授权,/home/hadoop/change to their own catalogue

After each slave machine has completed the above work, master can use SSH to log on to each host computer.

$ssh Slave1 #登录Slave1, success is not required password, and then the previous prompt will become the information you log on to the host computer

$ exit #退出登录

Note: If your slave username and the Slave1 in the Hosts file are not the same, the direct SSH Slave1 may not be able to login. So this can be changed to: SSH user name @slave1

If there's no problem, it means that SSH is a success.

3, the configuration of the cluster XML file

Unify all the Hadoop machines into the same directory. (node boot is master ssh to every slave and then start Hadoop in the same directory)

There are 5 configuration files that you must configure, and one that you might want to configure.

Necessary: Slaves, Core-site.xml, Hdfs-site.xml, Mapred-site.xml, Yarn-site.xml.

Possible: hadoop-env.sh

First say hadoop-env.sh This file, why is it possible? Because if the JDK path for each machine is different, you need to include the JDK path in the file:

Red box section: your JDK path.

Slaves file:

Writes the hostname as a DataNode to the file, one per line, and the default is localhost, so that, in a pseudo-distributed configuration, the node acts as a NameNode as well as a DataNode. The distributed configuration can retain localhost, or it can be deleted, so that the Master node is used only as NameNode. For example, Slave1 and Slave2 are added directly to them.

Core-site.xml file

<configuration>            <property>                    <name>fs.defaultFS</name>                    <value>hdfs://Master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> </configuration>

Hdfs-site.xml file:

<configuration><property><name>dfs.namenode.secondary.http-address</name><value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property>  <name>dfs.namenode.name.dir</name> <value>file:/usr/local/ Hadoop/tmp/dfs/name</value> </property> < Property> <name>dfs.datanode.data.dir</name>  <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property > </CONFIGURATION>          

Dfs.replication is the number of copies of the HDFS data block, which typically defaults to 3, but there are only two machines, so change to 2.

Mapred-site.xml file (the default file name is Mapred-site.xml.template, you need to change the file name to Mapred-site.xml), then the configuration is modified as follows:

 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>  <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address </name> <value>master:19888</value> </property> </CONFIGURATION>       

Yarn-site.xml file

<configuration>            <property>                    <name>yarn.resourcemanager.hostname</name>                    <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>

OK configuration complete

4. Make the same configuration for each host from the slave machine

Copy the/usr/local/hadoop folder to each slave to use a USB flash drive or remote copy

$cd /usr/local$sudo rm -r ./hadoop/tmp     # 删除 Hadoop 临时文件$sudo rm -r ./hadoop/logs/*   # 删除日志文件$tar -zcf ~/hadoop.master.tar.gz ./hadoop   # 先压缩再复制$cd ~$scp ./hadoop.master.tar.gz Slave0:/home/hadoop

After copy is finished, unzip the copied directory directly on the SLAVE1 and SLAVE2 nodes (the master node needs to have the same configuration as the slave node).

$sudo rm -r /usr/local/hadoop    # 删掉旧的(如果存在)$sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local3 $sudo chown -R hadoop /usr/local/hadoop

5. Start the Hadoop cluster (start on master):

$cd /usr/local/hadoop               #你的Hadoop文件夹$hdfs namenode -format             #格式化namenode$start-dfs.sh                      #启动hdfs$start-yarn.sh                     #启动yarn框架$mr-jobhistory-daemon.sh start historyserver  

Then use JPS to view the daemon for each node:

Master Node

Landlord no, pictures taken from the original reference blog

Slave1

Landlord no, pictures taken from the original reference blog

Slave2

Landlord no, pictures taken from the original reference blog

At this point you can view the node status on the Web page

1. Access http://localhost:50070 can see the number of nodes in the Hadoop cluster, namenode and the status of the entire distributed system (live node is the number of surviving, not 0 success).

2. Access http://localhost:50030 can view the running status of Jobtracker, such as the speed of job run, the number of maps, and the number of reduce.

3. Access http://localhost:8088 to view node status, etc.

6. Stop the cluster

$stop-yarn.sh$stop-dfs.sh$mr-jobhistory-daemon.sh stop historyserver

* Note: Encounter problems viewing log files a good choice

Keng:

9000 Port: JPS display has datenode, but Livenode is 0. 9000 Port really is a very strange problem, once stuck here for a long time. I've looked at a lot of ways, and this is basically the solution:

To uninstall a firewall:

sudo apt-get remove iptable

/etc/hosts file to make the following changes

127.0.0.1 localhost

127.0.1.1 localhost.localdomain localhost

0.0.0.0 Master

There are other things like this:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

10.20.77.172 hadoop-master

10.20.77.173 hadoop-slave1

10.20.77.174 hadoop-slave2

10.20.77.175 hadoop-slave3

Online various versions, I am still not very sure, everyone try it.

Datenode not started:

Delete everything in the TMP folder under Hadoop in all machines.

Ubuntu builds Hadoop's pit Tour (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.