Hadoop configuration, running error summary

Last Update:2014-12-25 Source: Internet

Author: User

Keywords DFS if Name java

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The novice to do Hadoop most headaches all kinds of problems, I put my own problems and solutions to sort out the first, I hope to help you.

First, the Hadoop cluster in namenode format (Bin/hadoop namenode-format) After the restart of the cluster will appear as follows (the problem is very obvious, basically no doubt)

Incompatible Namespaceids in ...: namenode Namespaceid = ... datanode namespaceid= ...

Error, because formatting Namenode creates a new Namespaceid that is inconsistent with the original datanode.

Workaround:

Delete the data file under the Datanode dfs.data.dir directory (default tmp/dfs/data)

Modify the Dfs.data.dir/current/version file to Namespaceid the same as the Namenode (there will be a hint in the log error)

Re-specify the new Dfs.data.dir directory

Second, the Hadoop cluster start start-all.sh, slave always can't start Datanode, and will error: (most of the network problem, 2,3 is mostly. 1, generally do not recommend, there is the possibility of loss of data. 4, the general will prompt insufficient space)

... could is replicated to 0 nodes, instead of 1.

That is, the identity of the node may be duplicated (the reason why the individual thinks this is wrong). There may be other reasons, please try the solution in turn, I was resolved.

Workaround:

1. Delete all nodes Dfs.data.dir and Dfs.tmp.dir directories (default to Tmp/dfs/data and Tmp/dfs/tmp), and then restart Hadoop namenode-format format nodes;

2. If it is a port access problem, you should make sure that the ports you use are open, such as Hdfs://machine1:9000/, 50030, 50070, and so on. Executes the #iptables-i input-p TCP--dport 9000-j ACCEPT command. If there is an error: HDFs. Dfsclient:exception in Createblockoutputstream Java.net.ConnectException:Connection refused; the port on the Datanode should not be accessible. Modify Iptables on Datanode: #iptables-i input-s machine1-p tcp-j ACCEPT

3. There may also be firewalls that restrict the communication between clusters. Try shutting down the firewall. /etc/init.d/iptables stop

4. Finally there may be insufficient disk space, please check the Df-al

I solve this problem when someone said: "Started Namenode, Datanode can solve this problem (I try to find useless, we can try) $hadoop-daemon.sh start namenode; $hadoop-daemon.sh Start Datanode

Third, the implementation of the procedure appears Error:java.lang.NullPointerException

Null pointer exception to ensure that the Java program is correct. The use of variables to instantiate the declaration before using, do not have the array of the same phenomenon. Check the program.

Four, the implementation of their own procedures, (various) error, please ensure that the situation:

The premise is that your program is compiled correctly

In cluster mode, write the data to be processed into the HDFS and make sure the HDFs path is correct

Specifies the name of the entry class for the executing jar package (I don't know why sometimes I don't specify and can run)

The correct wording is similar:

$ Hadoop jar Mycount.jar mycount Input Output

V. SSH does not normally communicate the problem, this problem I have in the construction of a detailed mention

Six, the program compiles the problem, each kind of package does not have the situation, please make sure that you put the Hadoop directory and the Hadoop/lib directory under the JAR package to have the introduction. Details are also seen in the construction of the operation.

When Hadoop starts Datanode, it appears unrecognized option:-JVM and could not create the Java virtual machine.

The following shell is in the Hadoop installation directory/bin/hadoop:

[Url=] View Code[/url] SHELL

123456class= ' Org.apache.hadoop.hdfs.server.datanode.DataNode ' if [[$EUID eq 0]]; Then hadoop_opts= "$HADOOP _OPTS-JVM server $HADOOP _datanode_opts" Else hadoop_opts= "$HADOOP _opts-server $HADOOP _ Datanode_opts "fi

$EUID the user ID here, if it is root, the logo will be 0, so try not to use the root user to manipulate Hadoop. This is why I mentioned in the configuration article not to use root user.

If the terminal error message appears:

ERROR HDFs. Dfsclient:exception closing File/user/hadoop/musicdata.txt:java.io.ioexception:all datanodes 10.210.70.82:50010 are Bad. Aborting ...

and the Jobtracker log error message.

Error Register Getprotocolversion

Java.lang.IllegalArgumentException:Duplicate metricsname:getprotocolversion

and possible warning messages:

WARN HDFs. Dfsclient:datastreamer Exception:java.io.IOException:Broken Pipe

WARN HDFs. Dfsclient:dfsoutputstream Responseprocessor exception for block blk_3136320110992216802_1063java.io.IOException: Connection Reset by peer

WARN HDFs. Dfsclient:error Recovery for blocks blk_3136320110992216802_1063 bad datanode[0] 10.210.70.82:50010 put:all datanodes 10.210.70.82:50010 are bad. Aborting ...

Solution:

View the path that the Dfs.data.dir property refers to whether the disk is full, and then try the Hadoop fs-put data again if it is full.

If the related disk is not full, you need to check that the related disk does not have bad sectors.

If the terminal error message appears:

ERROR HDFs. Dfsclient:exception closing File/user/hadoop/musicdata.txt:java.io.ioexception:all datanodes 10.210.70.82:50010 are Bad. Aborting ...

and the Jobtracker log error message.

Error Register Getprotocolversion

Java.lang.IllegalArgumentException:Duplicate metricsname:getprotocolversion

and possible warning messages:

WARN HDFs. Dfsclient:datastreamer Exception:java.io.IOException:Broken Pipe

WARN HDFs. Dfsclient:dfsoutputstream Responseprocessor exception for block blk_3136320110992216802_1063java.io.IOException: Connection Reset by peer

WARN HDFs. Dfsclient:error Recovery for blocks blk_3136320110992216802_1063 bad datanode[0] 10.210.70.82:50010 put:all datanodes 10.210.70.82:50010 are bad. Aborting ...

Solution:

View the path that the Dfs.data.dir property refers to whether the disk is full, and then try the Hadoop fs-put data again if it is full.

If the related disk is not full, you need to troubleshoot the related disk does not have bad sectors, you need to detect

If you get an error message when you execute a JAR program for Hadoop:

Java.io.IOException:Type mismatch in key from map:expected org.apache.hadoop.io.NullWritable, recieved Org.apache.hadoop.io.LongWritable

or similar:

status:failed java.lang.ClassCastException:org.apache.hadoop.io.LongWritable cant is cast to Org.apache.hadoop.io.Text

Then you need to learn the basics of Hadoop data types and Map/reduce models. The middle section of my reading note is a way to introduce the data types and custom data types defined by Hadoop (mainly the learning and understanding of the writable class); and the type and format of the MapReduce in this article. This is the fourth chapter of the Hadoop authoritative guide, the type and format of the MapReduce I/O and Chapter seventh. If you are anxious to solve this problem, I can now also tell you the quick solution, but this will certainly affect your future development:

Make sure the data is consistent:

... extends Mapper ...

public void Map (K1 k, V1 V, outputcollector output) ...

...

... extends reducer ...

public void Reduce (K2 k,v2 v,outputcollector Output) ...

...

Job.setmapoutputkeyclass (K2.class);

Job.setmapoutputvalueclass (K2.class);

Job.setoutputkeyclass (K3.class);

Job.setoutputvalueclass (V3.class);

...

Note the correspondence between k* and V. The suggestion is to look at the two chapters I just said. Know its rationale in detail.

Ten, if encountered datanode error as follows:

ERROR Org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException:Cannot lock Storage/data1/hadoop_data. The directory is already locked.

According to the error message, the directory is locked and cannot be read. At this point you need to check to see if the process is still running or the slave machine's associated Hadoop process is running, combining the Linux commands to view:

Netstat-nap

Ps-aux | grep-related PID

If a Hadoop-related process is still running, use the KILL command to kill it. Then reuse the start-all.sh.

Xi. if encountered jobtracker the following error:

Shuffle error:exceeded max_failed_unique_fetches; Bailing-out.

Workaround, modify the Hosts file in the Datanode node.

A brief introduction to the hosts format:

Each line is divided into three parts: the first part of the network IP address, the second part of the host name or domain name, the third part of the host alias

The detailed steps for the operation are as follows:

1, first view the host name:

Cat/proc/sys/kernel/hostname

You will see a hostname attribute, change the value in the back to IP OK, and then exit.

2, the use of the command:

Hostname ***.***.***.**

The asterisk is replaced with the corresponding IP.

3, modify the hosts configuration similar content as follows:

127.0.0.1 localhost.localdomain localhost

:: 1 localhost6.localdomain6 Localhost6

10.200.187.77 10.200.187.77 Hadoop-datanode

If the IP address is configured to indicate that the modification succeeded, if the host name is displayed there is a problem, continue to modify this hosts file,

The image above reminds me that Chenyi is the host name.

When in the test environment, to deploy a domain name server (personal feel very cumbersome), so simply, the direct use of IP address more convenient. If you have a domain name server, then directly mapped configuration can be.

If there is still a shuffle error this problem, then try other users said to modify the configuration file in the Hdfs-site.xml file, add the following:

Dfs.http.address

The *.*.*.*:50070 port is not changed, the asterisk is changed to IP, because the Hadoop information transmission is through HTTP, this port is unchanged.

Xi. if encountered jobtracker the following error:

Java.lang.RuntimeException:PipeMapRed.waitOutputThreads (): Subprocess failed with code *

This is the Java thrown system return error code, the meaning of the error code in detail please see here.

I encountered this here as a streaming PHP program and encountered an error code that was code 2:no such file or directory. The file or directory cannot be found. Find command unexpectedly forget to use ' php * * * * * * * * * * * * * *, another online see also may be include, require and other orders caused. Please change the details according to your own situation and error code.

Transferred from http://blog.pureisle.net/archives/1687.html

Personally feel, have some experience, basic will encounter similar problems, so to share with you!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More