Learn about problems with Hadoop and the solution

Last Update:2014-12-28 Source: Internet

Author: User

Keywords nbsp; delete Name java appear

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Learn about problems with Hadoop and Solutions blog Category: Cloud computing hadoopjvmeclipse&http://www.aliyun.com/zixun/aggregation/37954.html >nbsp;

1:shuffle error:exceeded max_failed_unique_fetches; Bailing-out

Answer:

The program needs to open a number of files, analysis, the general default number of systems is 1024, (with ULIMIT-A can see) for normal use is enough, but for the program, too little.

Ways to modify:

Modify 2 files.

/etc/security/limits.conf

Vi/etc/security/limits.conf

With：

* Soft Nofile 102400

* Hard Nofile 409600

$CD/etc/pam.d/

$sudo VI Login

Add Session Required/lib/security/pam_limits.so

2:too many Fetch-failures

Answer:

The main problem is that the connectivity between nodes is not comprehensive enough.

1) inspection, hosts

Native IP corresponding server name required

Required to include all server IP + server name

2) check. Ssh/authorized_keys

Requires public key that contains all servers (including themselves)

3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%

Answer:

Combined with the 2nd, and then

Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh

4: Can start Datanode, but cannot access, also cannot end the error

When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.

Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!

5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log

Most of this happens when the knot is broken and there is no connection.

6:java.lang.outofmemoryerror:java Heap Space

This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.

java-xms1024m-xmx4096m

The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)

7: Appear map%, but then reduce to about 98% time, directly into Failedjobs

Solution:

Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files

Check to see if the mapred.reduce.parallel.copies is set properly.

The/tmp folder under the system root is not available for deletion

otherwise Bin/hadoop JPS

An exception appears:

Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)

At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)

At Sun.tools.jps.Jps.main (jps.java:45)

While

Bin/hive

Unable to create log Directory/tmp/hadoopuser

2:too many Fetch-failures

Answer:

The main problem is that the connectivity between nodes is not comprehensive enough.

1) inspection, hosts

Native IP corresponding server name required

Required to include all server IP + server name

2) check. Ssh/authorized_keys

Requires public key that contains all servers (including themselves)

3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%

Answer:

Combined with the 2nd, and then

Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh

4: Can start Datanode, but cannot access, also cannot end the error

Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!

5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log

Most of this happens when the knot is broken and there is no connection.

6:java.lang.outofmemoryerror:java Heap Space

This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.

java-xms1024m-xmx4096m

7: Appear map%, but then reduce to about 98% time, directly into Failedjobs

Solution:

Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files

Check to see if the mapred.reduce.parallel.copies is set properly.

The/tmp folder under the system root is not available for deletion

otherwise Bin/hadoop JPS

An exception appears:

Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)

At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)

At Sun.tools.jps.Jps.main (jps.java:45)

While

Bin/hive

Unable to create log Directory/tmp/hadoopuser

2:too many Fetch-failures

Answer:

The main problem is that the connectivity between nodes is not comprehensive enough.

1) inspection, hosts

Native IP corresponding server name required

Required to include all server IP + server name

2) check. Ssh/authorized_keys

Requires public key that contains all servers (including themselves)

3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%

Answer:

Combined with the 2nd, and then

Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh

4: Can start Datanode, but cannot access, also cannot end the error

Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!

5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log

Most of this happens when the knot is broken and there is no connection.

6:java.lang.outofmemoryerror:java Heap Space

This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.

java-xms1024m-xmx4096m

7: Appear map%, but then reduce to about 98% time, directly into Failedjobs

Solution:

Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files

Check to see if the mapred.reduce.parallel.copies is set properly.

The/tmp folder under the system root is not available for deletion

(JPs is based on jvmstat and it needs to being Inc. to secure a memory, mapped file on the temporary file system.

）

otherwise Bin/hadoop JPS

An exception appears:

Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)

At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)

At Sun.tools.jps.Jps.main (jps.java:45)

While

Bin/hive

Unable to create log Directory/tmp/hadoopuser

Hadoop Java.io.ioexception:cannot Open filename/user/...

This error occurred while writing programs running in Eclipse, Hadoop java.io.ioexception:cannot open filename/user/...

Got a half-day, also looked at the log file, may be 1 input file name wrong 2) to delete all the Hadoop.temp.dir, the Datanode is also, and then reformat the restart HADOOP3) in Safe mode, waiting to automatically stop or manually stop Safe mode

10/10/25 16:45:39 INFO mapred. Jobclient:map 92% Reduce 30%

10/10/25 16:45:44 INFO mapred. Jobclient:task Id:attempt_201010251638_0003_m_000013_1, status:failed

Java.io.IOException:Cannot Open filename/user/eryk/input/conf

Well, the discovery is a command to hit the problem

Wrong command:

eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop fs-put Conf/input

Content included:

eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop FS-LSR

Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input

-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:48/user/eryk/input/capacity-scheduler.xml

Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input/conf

-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:49/user/eryk/input/conf/capacity-scheduler.xml

-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:49/user/eryk/input/conf/configuration.xsl

-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:49/user/eryk/input/conf/core-site.xml

-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:49/user/eryk/input/conf/hadoop-env.sh

-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:49/user/eryk/input/conf/hadoop-metrics.properties

-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:49/user/eryk/input/conf/hadoop-policy.xml

-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:49/user/eryk/input/conf/hdfs-site.xml

-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:49/user/eryk/input/conf/log4j.properties

-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:49/user/eryk/input/conf/mapred-site.xml

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/conf/masters

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/conf/slaves

-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:49/user/eryk/input/conf/ssl-client.xml.example

-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:49/user/eryk/input/conf/ssl-server.xml.example

-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:48/user/eryk/input/configuration.xsl

-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:48/user/eryk/input/core-site.xml

-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:48/user/eryk/input/hadoop-env.sh

-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:48/user/eryk/input/hadoop-metrics.properties

-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:48/user/eryk/input/hadoop-policy.xml

-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:48/user/eryk/input/hdfs-site.xml

-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:48/user/eryk/input/log4j.properties

-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:48/user/eryk/input/mapred-site.xml

-rw-r--r--1 Eryk supergroup 2010-10-25 16:48/user/eryk/input/masters

-rw-r--r--1 Eryk supergroup 2010-10-25 16:48/user/eryk/input/slaves

-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:48/user/eryk/input/ssl-client.xml.example

-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:48/user/eryk/input/ssl-server.xml.example

And found the contents repeated.

Modified command:

eryk@eryk-1520:~/tmp/hadoop$ bin/hadoop fs-put conf input

Just removed the "/" Behind the Conf.

Inside the content:

eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop FS-LSR

Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input

-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:49/user/eryk/input/capacity-scheduler.xml

-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:49/user/eryk/input/configuration.xsl

-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:49/user/eryk/input/core-site.xml

-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:49/user/eryk/input/hadoop-env.sh

-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:49/user/eryk/input/hadoop-metrics.properties

-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:49/user/eryk/input/hadoop-policy.xml

-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:49/user/eryk/input/hdfs-site.xml

-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:49/user/eryk/input/log4j.properties

-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:49/user/eryk/input/mapred-site.xml

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/masters

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/slaves

-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:49/user/eryk/input/ssl-client.xml.example

-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:49/user/eryk/input/ssl-server.xml.example

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More