Learn about problems with Hadoop and Solutions blog Category: Cloud computing hadoopjvmeclipse&http://www.aliyun.com/zixun/aggregation/37954.html >nbsp;
1:shuffle error:exceeded max_failed_unique_fetches; Bailing-out
Answer:
The program needs to open a number of files, analysis, the general default number of systems is 1024, (with ULIMIT-A can see) for normal use is enough, but for the program, too little.
Ways to modify:
Modify 2 files.
/etc/security/limits.conf
Vi/etc/security/limits.conf
With:
* Soft Nofile 102400
* Hard Nofile 409600
$CD/etc/pam.d/
$sudo VI Login
Add Session Required/lib/security/pam_limits.so
2:too many Fetch-failures
Answer:
The main problem is that the connectivity between nodes is not comprehensive enough.
1) inspection, hosts
Native IP corresponding server name required
Required to include all server IP + server name
2) check. Ssh/authorized_keys
Requires public key that contains all servers (including themselves)
3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%
Answer:
Combined with the 2nd, and then
Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh
4: Can start Datanode, but cannot access, also cannot end the error
When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.
Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!
5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log
Most of this happens when the knot is broken and there is no connection.
6:java.lang.outofmemoryerror:java Heap Space
This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.
java-xms1024m-xmx4096m
The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)
7: Appear map%, but then reduce to about 98% time, directly into Failedjobs
Solution:
Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files
Check to see if the mapred.reduce.parallel.copies is set properly.
8:
The/tmp folder under the system root is not available for deletion
otherwise Bin/hadoop JPS
An exception appears:
Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)
At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)
At Sun.tools.jps.Jps.main (jps.java:45)
While
Bin/hive
Unable to create log Directory/tmp/hadoopuser
2:too many Fetch-failures
Answer:
The main problem is that the connectivity between nodes is not comprehensive enough.
1) inspection, hosts
Native IP corresponding server name required
Required to include all server IP + server name
2) check. Ssh/authorized_keys
Requires public key that contains all servers (including themselves)
3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%
Answer:
Combined with the 2nd, and then
Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh
4: Can start Datanode, but cannot access, also cannot end the error
When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.
Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!
5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log
Most of this happens when the knot is broken and there is no connection.
6:java.lang.outofmemoryerror:java Heap Space
This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.
java-xms1024m-xmx4096m
The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)
7: Appear map%, but then reduce to about 98% time, directly into Failedjobs
Solution:
Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files
Check to see if the mapred.reduce.parallel.copies is set properly.
8:
The/tmp folder under the system root is not available for deletion
otherwise Bin/hadoop JPS
An exception appears:
Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)
At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)
At Sun.tools.jps.Jps.main (jps.java:45)
While
Bin/hive
Unable to create log Directory/tmp/hadoopuser
2:too many Fetch-failures
Answer:
The main problem is that the connectivity between nodes is not comprehensive enough.
1) inspection, hosts
Native IP corresponding server name required
Required to include all server IP + server name
2) check. Ssh/authorized_keys
Requires public key that contains all servers (including themselves)
3: Processing speed particularly slow to appear map quickly but reduce is slow and recurring reduce=0%
Answer:
Combined with the 2nd, and then
Modify Export hadoop_heapsize=4000 in conf/hadoop-env.sh
4: Can start Datanode, but cannot access, also cannot end the error
When reformatting a new distributed file, you need to remove the Dfs.name.dir Namenode that you configured on your namenode to store the local file system path for the Namenode persistent storage namespace and transaction log. Also deletes the directory of the Dfs.data.dir on each DataNode DataNode the local file system path that holds the block data. If this configuration is to remove/home/hadoop/namedata on the Namenode, delete/home/hadoop/datanode1 and/home/hadoop/datanode2 on the Datanode. This is because Hadoop, when formatting a new Distributed file system, each stored namespace corresponds to that version of the build time (you can view the version file in the/home/hadoop/namedata/current directory, which records the release information), When reformatting a new distributed system file, it is best to delete the Namedata directory first. The dfs.data.dir of each datanode must be deleted. This allows the information version of the Namedode and Datanode records to correspond.
Note: Delete is a very dangerous action, can not confirm the case can not be deleted!! Do the deleted files and so on backup!!
5:java.io.ioexception:could not obtain block:blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_ Log/src_20090724_log
Most of this happens when the knot is broken and there is no connection.
6:java.lang.outofmemoryerror:java Heap Space
This exception is clearly due to insufficient JVM memory to modify all Datanode JVM memory sizes.
java-xms1024m-xmx4096m
The maximum memory usage for a general JVM should be half the total memory size, the 8G memory we use, so set to 4096m, which may still not be the optimal value. (In fact, for a 0.8 that is best set to true physical memory size)
7: Appear map%, but then reduce to about 98% time, directly into Failedjobs
Solution:
Checking mapred.map.tasks is not set too much, setting too much will result in handling a large number of small files
Check to see if the mapred.reduce.parallel.copies is set properly.
8:
The/tmp folder under the system root is not available for deletion
(JPs is based on jvmstat and it needs to being Inc. to secure a memory, mapped file on the temporary file system.
)
otherwise Bin/hadoop JPS
An exception appears:
Exception in thread ' main ' java.lang.NullPointerException at Sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (localvmmanager.java:127)
At Sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms (monitoredhostprovider.java:133)
At Sun.tools.jps.Jps.main (jps.java:45)
While
Bin/hive
Unable to create log Directory/tmp/hadoopuser
Hadoop Java.io.ioexception:cannot Open filename/user/...
This error occurred while writing programs running in Eclipse, Hadoop java.io.ioexception:cannot open filename/user/...
Got a half-day, also looked at the log file, may be 1 input file name wrong 2) to delete all the Hadoop.temp.dir, the Datanode is also, and then reformat the restart HADOOP3) in Safe mode, waiting to automatically stop or manually stop Safe mode
10/10/25 16:45:39 INFO mapred. Jobclient:map 92% Reduce 30%
10/10/25 16:45:44 INFO mapred. Jobclient:task Id:attempt_201010251638_0003_m_000013_1, status:failed
Java.io.IOException:Cannot Open filename/user/eryk/input/conf
Well, the discovery is a command to hit the problem
Wrong command:
eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop fs-put Conf/input
Content included:
eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop FS-LSR
Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input
-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:48/user/eryk/input/capacity-scheduler.xml
Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input/conf
-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:49/user/eryk/input/conf/capacity-scheduler.xml
-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:49/user/eryk/input/conf/configuration.xsl
-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:49/user/eryk/input/conf/core-site.xml
-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:49/user/eryk/input/conf/hadoop-env.sh
-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:49/user/eryk/input/conf/hadoop-metrics.properties
-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:49/user/eryk/input/conf/hadoop-policy.xml
-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:49/user/eryk/input/conf/hdfs-site.xml
-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:49/user/eryk/input/conf/log4j.properties
-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:49/user/eryk/input/conf/mapred-site.xml
-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/conf/masters
-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/conf/slaves
-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:49/user/eryk/input/conf/ssl-client.xml.example
-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:49/user/eryk/input/conf/ssl-server.xml.example
-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:48/user/eryk/input/configuration.xsl
-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:48/user/eryk/input/core-site.xml
-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:48/user/eryk/input/hadoop-env.sh
-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:48/user/eryk/input/hadoop-metrics.properties
-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:48/user/eryk/input/hadoop-policy.xml
-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:48/user/eryk/input/hdfs-site.xml
-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:48/user/eryk/input/log4j.properties
-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:48/user/eryk/input/mapred-site.xml
-rw-r--r--1 Eryk supergroup 2010-10-25 16:48/user/eryk/input/masters
-rw-r--r--1 Eryk supergroup 2010-10-25 16:48/user/eryk/input/slaves
-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:48/user/eryk/input/ssl-client.xml.example
-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:48/user/eryk/input/ssl-server.xml.example
And found the contents repeated.
Modified command:
eryk@eryk-1520:~/tmp/hadoop$ bin/hadoop fs-put conf input
Just removed the "/" Behind the Conf.
Inside the content:
eryk@eryk-1520:~/tmp/hadoop$ Bin/hadoop FS-LSR
Drwxr-xr-x-Eryk supergroup 0 2010-10-25 16:49/user/eryk/input
-rw-r--r--1 Eryk supergroup 3936 2010-10-25 16:49/user/eryk/input/capacity-scheduler.xml
-rw-r--r--1 Eryk supergroup 535 2010-10-25 16:49/user/eryk/input/configuration.xsl
-rw-r--r--1 Eryk supergroup 388 2010-10-25 16:49/user/eryk/input/core-site.xml
-rw-r--r--1 Eryk supergroup 2360 2010-10-25 16:49/user/eryk/input/hadoop-env.sh
-rw-r--r--1 Eryk supergroup 1245 2010-10-25 16:49/user/eryk/input/hadoop-metrics.properties
-rw-r--r--1 Eryk supergroup 4190 2010-10-25 16:49/user/eryk/input/hadoop-policy.xml
-rw-r--r--1 Eryk supergroup 258 2010-10-25 16:49/user/eryk/input/hdfs-site.xml
-rw-r--r--1 Eryk supergroup 2815 2010-10-25 16:49/user/eryk/input/log4j.properties
-rw-r--r--1 Eryk supergroup 274 2010-10-25 16:49/user/eryk/input/mapred-site.xml
-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/masters
-rw-r--r--1 Eryk supergroup 2010-10-25 16:49/user/eryk/input/slaves
-rw-r--r--1 Eryk supergroup 1243 2010-10-25 16:49/user/eryk/input/ssl-client.xml.example
-rw-r--r--1 Eryk supergroup 1195 2010-10-25 16:49/user/eryk/input/ssl-server.xml.example