Learn about problems with Hadoop and the solution

Source: Internet
Author: User
Keywords nbsp; delete Name java appear
Learn about problems with Hadoop and Solutions blog Category: Cloud computing hadoopjvmeclipse&http://www.aliyun.com/zixun/aggregation/37954.html >nbsp;

1:shuffle error:exceeded max_failed_unique_fetches; Bailing-out

Answer:

The program needs to open a number of files, analysis, the general default number of systems is 1024, (with ULIMIT-A can see) for normal use is enough, but for the program, too little.

Ways to modify:

Modify 2 files.

/etc/security/limits.conf

Vi/etc/security/limits.conf

With:

* Soft Nofile 102400

* Hard Nofile 409600

$CD/etc/pam.d/

$sudo VI Login

Add Session Required/lib/security/pam_limits.so

2:too many Fetch-failures

Answer:

The main problem is that the connectivity between nodes is not comprehensive enough.

1) inspection, hosts

Native IP corresponding server name required

Required to include all server IP + server name

2) check. Ssh/authorized_keys

Requires public key that contains all servers (including themselves)

3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%

Answer:

Combined with the 2nd, and then

Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh

4: Can start Datanode, but cannot access, also cannot end the error

When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.

Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!

5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log

Most of this happens when the knot is broken and there is no connection.

6:java.lang.outofmemoryerror:java Heap Space

This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.

java-xms1024m-xmx4096m

The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)

7: Appear map%, but then reduce to about 98% time, directly into Failedjobs

Solution:

Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files

Check to see if the mapred.reduce.parallel.copies is set properly.

8:

The/tmp folder under the system root is not available for deletion

otherwise Bin/hadoop JPS

An exception appears:

Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)

At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)

At Sun.tools.jps.Jps.main (jps.java:45)

While

Bin/hive

Unable to create log Directory/tmp/hadoopuser

2:too many Fetch-failures

Answer:

The main problem is that the connectivity between nodes is not comprehensive enough.

1) inspection, hosts

Native IP corresponding server name required

Required to include all server IP + server name

2) check. Ssh/authorized_keys

Requires public key that contains all servers (including themselves)

3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%

Answer:

Combined with the 2nd, and then

Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh

4: Can start Datanode, but cannot access, also cannot end the error

When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.

Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!

5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log

Most of this happens when the knot is broken and there is no connection.

6:java.lang.outofmemoryerror:java Heap Space

This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.

java-xms1024m-xmx4096m

The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)

7: Appear map%, but then reduce to about 98% time, directly into Failedjobs

Solution:

Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files

Check to see if the mapred.reduce.parallel.copies is set properly.

8:

The/tmp folder under the system root is not available for deletion

otherwise Bin/hadoop JPS

An exception appears:

Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)

At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)

At Sun.tools.jps.Jps.main (jps.java:45)

While

Bin/hive

Unable to create log Directory/tmp/hadoopuser

2:too many Fetch-failures

Answer:

The main problem is that the connectivity between nodes is not comprehensive enough.

1) inspection, hosts

Native IP corresponding server name required

Required to include all server IP + server name

2) check. Ssh/authorized_keys

Requires public key that contains all servers (including themselves)

3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%

Answer:

Combined with the 2nd, and then

Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh

4: Can start Datanode, but cannot access, also cannot end the error

When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.

Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!

5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log

Most of this happens when the knot is broken and there is no connection.

6:java.lang.outofmemoryerror:java Heap Space

This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.

java-xms1024m-xmx4096m

The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)

7: Appear map%, but then reduce to about 98% time, directly into Failedjobs

Solution:

Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files

Check to see if the mapred.reduce.parallel.copies is set properly.

8:

The/tmp folder under the system root is not available for deletion

(JPs is based on jvmstat and it needs to being Inc. to secure a memory, mapped file on the temporary file system.

otherwise Bin/hadoop JPS

An exception appears:

Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)

At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)

At Sun.tools.jps.Jps.main (jps.java:45)

While

Bin/hive

Unable to create log Directory/tmp/hadoopuser

Hadoop Java.io.ioexception:cannot Open filename/user/...

This error occurred while writing programs running in Eclipse, Hadoop java.io.ioexception:cannot open filename/user/...

Got a half-day, also looked at the log file, may be 1 input file name wrong 2) to delete all the Hadoop.temp.dir, the Datanode is also, and then reformat the restart HADOOP3) in Safe mode, waiting to automatically stop or manually stop Safe mode

10/10/25 16:45:39 INFO mapred. Jobclient:map 92% Reduce 30%

10/10/25 16:45:44 INFO mapred. Jobclient:task Id:attempt_201010251638_0003_m_000013_1, status:failed

Java.io.IOException:Cannot Open filename/user/eryk/input/conf

Well, the discovery is a command to hit the problem

Wrong command:

eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop fs-put Conf/input

Content included:

eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop FS-LSR

Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input

-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:48/user/eryk/input/capacity-scheduler.xml

Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input/conf

-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:49/user/eryk/input/conf/capacity-scheduler.xml

-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:49/user/eryk/input/conf/configuration.xsl

-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:49/user/eryk/input/conf/core-site.xml

-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:49/user/eryk/input/conf/hadoop-env.sh

-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:49/user/eryk/input/conf/hadoop-metrics.properties

-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:49/user/eryk/input/conf/hadoop-policy.xml

-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:49/user/eryk/input/conf/hdfs-site.xml

-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:49/user/eryk/input/conf/log4j.properties

-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:49/user/eryk/input/conf/mapred-site.xml

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/conf/masters

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/conf/slaves

-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:49/user/eryk/input/conf/ssl-client.xml.example

-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:49/user/eryk/input/conf/ssl-server.xml.example

-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:48/user/eryk/input/configuration.xsl

-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:48/user/eryk/input/core-site.xml

-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:48/user/eryk/input/hadoop-env.sh

-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:48/user/eryk/input/hadoop-metrics.properties

-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:48/user/eryk/input/hadoop-policy.xml

-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:48/user/eryk/input/hdfs-site.xml

-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:48/user/eryk/input/log4j.properties

-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:48/user/eryk/input/mapred-site.xml

-rw-r--r--1 Eryk supergroup 2010-10-25 16:48/user/eryk/input/masters

-rw-r--r--1 Eryk supergroup 2010-10-25 16:48/user/eryk/input/slaves

-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:48/user/eryk/input/ssl-client.xml.example

-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:48/user/eryk/input/ssl-server.xml.example

And found the contents repeated.

Modified command:

eryk@eryk-1520:~/tmp/hadoop$ bin/hadoop fs-put conf input

Just removed the "/" Behind the Conf.

Inside the content:

eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop FS-LSR

Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input

-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:49/user/eryk/input/capacity-scheduler.xml

-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:49/user/eryk/input/configuration.xsl

-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:49/user/eryk/input/core-site.xml

-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:49/user/eryk/input/hadoop-env.sh

-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:49/user/eryk/input/hadoop-metrics.properties

-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:49/user/eryk/input/hadoop-policy.xml

-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:49/user/eryk/input/hdfs-site.xml

-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:49/user/eryk/input/log4j.properties

-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:49/user/eryk/input/mapred-site.xml

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/masters

-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/slaves

-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:49/user/eryk/input/ssl-client.xml.example

-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:49/user/eryk/input/ssl-server.xml.example

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.