About hadoop2.x (2.7.1 2.7.2) cluster configuration and test run in Ubuntu virtual machine VM settings Nat mode causes node transfer problems

Source: Internet
Author: User
Tags ssh access

Cluster configuration is similar, here I briefly say my configuration:

The master node system is Ubuntu 14.04 LTS x64 other two nodes in the VM system for the CentOS 6.4 x64

JVM is jdk1.7_80

Hadoop versions 2.7.1 and 2.7.2 have tried

The problems that arise are:

Start HDFs system OK, all started up, JPS see the following

Master Nodes Secondarynamenode and NameNode

From node: DataNode

But using the Hfds command Dfsadmin-report found that there are only 1 Datanode live, and when you have a different time report, the surviving node is alternately changed, one will be datanode1, one will be Datanode2

As follows

[Email protected]:modules$ hdfs dfsadmin-reportconfigured capacity:16488800256 (15.36 GB) Present capacity:13008093184 (12.11 GB) DFS remaining:13008068608 (12.11 GB) Dfs used:24576 (in KB) DFS used%: 0.00%under replicated blocks:0blocks with corrupt Replicas:0missing blocks:0missing blocks (with Replication factor 1): 0---------------------------------------------- ---Live datanodes (1): name:192.168.2.3:50010 (Hadoop) hostname:hadoop1decommission status:normalconfigured Capacity: 16488800256 (15.36 GB) Dfs used:24576 (in KB) Non DFS used:3480969216 (3.24 GB) DFS remaining:13007806464 (12.11 GB) DFS Us ed%: 0.00%dfs remaining%: 78.89%configured cache capacity:0 (0 B) cache used:0 (0 B) cache remaining:0 (0 B) Cache used%: 100.00%cache remaining%: 0.00%xceivers:1last Contact:mon may 17:30:08 CST 2016

Report again

[Email protected]:modules$ hdfs dfsadmin-reportconfigured capacity:16488800256 (15.36 GB) Present capacity:13008007168 (12.11 GB) DFS remaining:13007982592 (12.11 GB) Dfs used:24576 (in KB) DFS used%: 0.00%under replicated blocks:0blocks with corrupt Replicas:0missing blocks:0missing blocks (with Replication factor 1): 0---------------------------------------------- ---Live datanodes (1): name:192.168.2.3:50010 (Hadoop) hostname:hadoop2decommission status:normalconfigured Capacity: 16488800256 (15.36 GB) Dfs used:24576 (in KB) Non DFS used:3480793088 (3.24 GB) DFS remaining:13007982592 (12.11 GB) DFS Us ed%: 0.00%dfs remaining%: 78.89%configured cache capacity:0 (0 B) cache used:0 (0 B) cache remaining:0 (0 B) Cache used%: 100.00%cache remaining%: 0.00%xceivers:1last Contact:mon may 17:34:06 CST 2016

It's strange. At the same time, there can only be 1 when viewing Datanode surviving nodes through Web UI 50070. And when you refresh the page, the surviving node is changed, ibid.

At first I did not see this, I was through dfs-mkdir/test, and then put the file, an IO transmission exception occurred: The main exception content is as follows

<pre name= "code" class= "HTML" >hdfs. DFSClient:org.apache.hadoop.ipc.RemoteException:Java.io.IOException:
Could replicated to 0 nodes, instead of 1

I was wondering, first, to test each machine's firewall, Selinux, and then ping each other, and then SSH connection test is no problem.

Everything is normal ah, why the transmission of the exception ...

Intercept part of the log as follows;

2016-05-10 01:29:54,148 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:Block pool Block Pool bp-1877985316-192.168.2.3-1462786104060 (Datanode Uuid c31e3853-b15e-46d8-abd0-ac1d1ed4572b) service to hadoop/ 192.168.2.3:9000 successfully registered with NN2016-05-10 01:29:54,151 INFO Org.apache.hadoop.hdfs.server.datanode.DataNode:Successfully sent block report 0x44419c23fe, containing 1 storage Report (s), of which we sent 1. The reports had 0 total blocks and used 1 RPC (s). This took 0 msec to generate and 2 msecs for RPC and NN processing. Got back one command:finalizecommand/5.2016-05-10 01:29:54,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got Finalize command for block pool bp-1877985316-192.168.2.3-14627861040602016-05-10 01:29:57,150 INFO Org.apache.hadoop.hdfs.server.datanode.DataNode:DatanodeCommand Action:dna_register from hadoop/192.168.2.3:9000 With active state2016-05-10 01:29:57,153 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:Block Pool bp-1877985316-192.168.2.3-1462786104060 (Datanode Uuid c31e3853-b15e-46d8-abd0-ac1d1ed4572b) service to hadoop/ 192.168.2.3:9000 beginning handshake with NN

This log is repeated every 2-3 seconds, and the other node is the same.

In addition I looked at the virtual machine in the lower right corner of the network transmission signal, but also basically not 1 seconds flash, the main node every second and from the node for an SSH interaction, at first did not care. Thought it was a heartbeat.

In fact, if it is normal, only every time SSH proposed a requets time, will blink a bit, I understand so


Then check the official documentation, refer to other online resources and questions, have tried all not ... I was wondering if I was dead.

Small modified some of their own configuration files, but as long as the basic things are configured correctly, small changes are not much of a relationship. This is the file I configured:

Core-site.xml

<configuration>    <property>        <name>fs.defaultFS</name>        <value>hdfs:// hadoop:9000</value>    </property>    <property>        <name>hadoop.tmp.dir</name >        <value>/opt/modules/hadoop-2.7.2/data/tmp</value>    </property>    <property >                <name>io.file.buffer.size</name>                <value>131072</value>        </ property>        <property>               <name>hadoop.proxyuser.hadoop.hosts</name>               <value >*</value>       </property>       <property>               <name> hadoop.proxyuser.hadoop.groups</name>               <value>*</value>       </property></ Configuration>
Hdfs-site.xml

<configuration><property><name>dfs.namenode.secondary.http-address</name><value> Hadoop:50090</value></property><property><name>dfs.replication</name><value >2</value></property><property><name>dfs.permissions.enabled</name><value >false</value></property>        <property>             <name>dfs.namenode.name.dir</name >             <value>file:/opt/modules/hadoop-2.7.2/data/dfs/name</value></property><property ><name>dfs.datanode.data.dir</name><value>file:/opt/modules/hadoop-2.7.2/data/dfs/data </value></property><property><name>dfs.webhdfs.enabled</name><value>true </value></property></configuration>

This is only related to the HDFs file system, so mapred-site.xml and yarh-sitexml are not exhausted here.

After countless times of namnode format and delete data, the file in the/tmp directory, and then I began to suspect that the Hadoop version of the problem, try to start from the beginning hadoop2.7.1 to 2.7.2, and then began to doubt the Ubuntu problem. I went on to my Win7 system, running three nodes in the virtual machine, uploading a file successfully, well, I was wondering. This is where the Unbutu problem is,,,

It's all about the connection. SSH transmission problem: I started to wonder if it was a virtual machine network problem. I changed a few DNS, from the VM in the IP xx.xx.xx.1 of the net adapter in Ubuntu to my ubunt access to the outside network router ip,xx.xx.xx.1, there is no problem ah, each node ping themselves, gateways, mutual ping and external network, are very smooth. The explanation is not the problem here.

The last one tried. Here I set the VM Net link mode ... Change to bridging mode. Configure an Extranet DNS,,. then re-format Namenode, start HDFs ... Check Datanode report, upload file test,,, everything goes well. Finally changed the success.


Put a file


Web UI view, the surviving two nodes are displayed, and will not be the same as only one occurrence:

Personal summary;

Start HDFs file system, boot node started normally, but Datanode number is not normal, put file times io wrong.

No outside

1 configuration of slave or hdfs-site.xml problem

There is a problem with 2SSH transmission.

3 network transmission such as gateway, DNS,IP configuration, the Hosts file has a problem.

4 or more are set to normal, in a different way to test the link. For example, to change the net or bridge mode of a virtual machine,


To start from here, like this relatively rare problem, you can only try to make a tune.

I'm not quite aware of why the VM virtual machine net connection in Ubuntu is causing this single-wire transmission. Want to know the following supplements:

After all, only Hadoop transmission problems, and other such as the virtual machine in Ping, networking, file transfer, SSH access and so on, are very smooth

About hadoop2.x (2.7.1 2.7.2) cluster configuration and test run in Ubuntu virtual machine VM settings Nat mode causes node transfer problems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.