How to troubleshoot problems
General error, view error output, follow keyword Google
Exception errors (such as Namenode, Datanode, inexplicably hung): View HADOOP ($HADOOP _home/logs) or hive logs
Hadoop error
1.datanode does not start properly
After adding Datanode, Datanode does not start normally, the process will somehow hang up, the view Namenode log shows as follows:
Text Code
2013-06-21 18:53:39,182 FATAL org.apache.hadoop.hdfs.statechange:block* NameSystem.getDatanode:Data node x.x.x.x : 50010 is attempting to report storage ID ds-1357535176-x.x.x.x-50010-1371808472808. Node y.y.y.y:50010 is expected to serve this storage.
Cause Analysis:
When you copy the Hadoop installation package, the data and TMP folders are included (see my Hadoop installation article) and the format is not successful Datanode
Workaround:
Shell Code
Rm-rf/data/hadoop/hadoop-1.1.2/data
Rm-rf/data/hadoop/hadoop-1.1.2/tmp
Hadoop Datanode-format
2. Safe Mode
Text Code
2013-06-20 10:35:43,758 ERROR org.apache.hadoop.security.UserGroupInformation:PriviledgedActionException as:hadoop Cause:org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot Renew lease for Dfsclient_hb_rs_ wdev1.corp.qihoo.net,60020,1371631589073. Name node is in safe mode.
Solution:
Shell Code
Hadoop Dfsadmin-safemode Leave
3. Connection exceptions
Text Code
2013-06-21 19:55:05,801 WARN Org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException:Call to homename/ x.x.x.x:9000 failed on local exception:java.io.EOFException
Possible causes:
Solution:
4. Namenode ID
Text Code
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException:Incompatible Namespaceids in/var/lib/ Hadoop-0.20/cache/hdfs/dfs/data:namenode Namespaceid = 240012870; Datanode Namespaceid = 1462711424.
Problem: Namespaceid on Namenode is inconsistent with Datanode on Namespaceid.
The cause of the problem: each time Namenode format re-creates a Namenodeid, and Tmp/dfs/data contains the Id,namenode format under the last format to empty the data under Namenode, However, the data under Datanode is not emptied, so the Namespaceid on the Namenode node is inconsistent with Namespaceid on the Datanode node. Failed to start.
Workaround: Refer to the URL http://blog.csdn.net/wh62592855/archive/2010/07/21/5752199.aspx gives two solutions, we use the first solution: that is:
(1) Stop the Cluster service
(2) Delete the data directory on the Datanode node of the problem, the data directory is the DFS.DATA.DIR directory configured in the Hdfs-site.xml file, and the one on this machine is the/var/lib/hadoop-0.20/cache/hdfs/ dfs/data/(Note: We performed this step on all the Datanode and Namenode nodes at that time. In case the deletion is unsuccessful, you can save a copy of the data directory first.
(3) formatting of Namenode.
(4) Restart the cluster.
Problem solving.
One side effect of this approach is that all data on the HDFs is lost. If there are important data stored in HDFs, it is not recommended that you try the second method in the URL provided.
5. Directory Permissions
Start-dfs.sh Execute Error-free, show start Datanode, no datanode after execution. Check the logs on the Datanode machine to show that the permissions for the Dfs.data.dir directory are incorrect:
Text Code
Expected:drwxr-xr-x,current:drwxrwxr-x
Workaround:
To view the directory configuration of the Dfs.data.dir, modify the permissions.
Hive Error
1.NoClassDefFoundError
Could not initialize class Java.lang.NoClassDefFoundError:Could not initialize class Org.apache.hadoop.hbase.io.HbaseObjectWritable
Add Protobuf-***.jar to the jars path
XML code
$HIVE _home/conf/hive-site.xml
Hive.aux.jars.path
file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/ hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/ guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/ Protobuf-java-2.4.0a.jar
2.hive Dynamic Partition exception
[Fatal Error] Operator fs_2 (id=2): Number of dynamic partitions exceeded Hive.exec.max.dynamic.partitions.pernode
Shell Code
hive> Set hive.exec.max.dynamic.partitions.pernode = 10000;
3.mapreduce process Hyper-memory limit--hadoop Java heap Space
Vim Mapred-site.xml Add:
XML code
Mapred-site.xml
Mapred.child.java.opts
-xmx2048m
Shell Code
# $HADOOP _home/conf/hadoop_env.sh
Export hadoop_heapsize=5000
Limit of 4.hive files
[Fatal Error] Total number of created files are 100086, which exceeds 100000
Shell Code
Hive> set hive.exec.max.created.files=655350;
5.metastore Connection Timeout
Text Code
Failed:semanticexception Org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException:Read Timed out
Solution:
Shell Code
Hive> set hive.metastore.client.socket.timeout=500;
6. java.io.ioexception:error=7, Argument list too long
Text Code
Task with the most failures (5):
-----
Task ID:
task_201306241630_0189_r_000009
Url:
http://namenode.godlovesdog.com:50030/taskdetails.jsp?jobid=job_201306241630_0189&tipid=task_201306241630_ 0189_r_000009
-----
Diagnostic Messages for this Task:
Java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:Hive Runtime Error while processing row (tag=0) {"Key": {"Reducesinkkey0": "164058872", "Reducesinkkey1": "Djh,s1", "Reducesinkkey2": "20130117170703", " Reducesinkkey3 ":" XXX "}," value ": {" _col0 ":" 1 "," _col1 ":" xxx "," _col2 ":" 20130117170703 "," _col3 ":" 164058872 "," _col4 " : "Xxx,s1"}, "Alias": 0}
At Org.apache.hadoop.hive.ql.exec.ExecReducer.reduce (execreducer.java:270)
At Org.apache.hadoop.mapred.ReduceTask.runOldReducer (reducetask.java:520)
At Org.apache.hadoop.mapred.ReduceTask.run (reducetask.java:421)
At Org.apache.hadoop.mapred.child$4.run (child.java:255)
At java.security.AccessController.doPrivileged (Native Method)
At Javax.security.auth.Subject.doAs (subject.java:415)
At Org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1149)
At Org.apache.hadoop.mapred.Child.main (child.java:249)
caused by:org.apache.hadoop.hive.ql.metadata.HiveException:Hive Runtime Error while processing row (tag=0) {"key": {" Reducesinkkey0 ":" 164058872 "," Reducesinkkey1 ":" Xxx,s1 "," Reducesinkkey2 ":" 20130117170703 "," Reducesinkkey3 ":" XXX " }, "value": {"_col0": "1", "_col1": "xxx", "_col2": "20130117170703", "_col3": "164058872", "_col4": "Djh,s1"}, "Alias": 0}
At Org.apache.hadoop.hive.ql.exec.ExecReducer.reduce (execreducer.java:258)
... 7 more
caused by:org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20000]: Unable to initialize custom script.
At Org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp (scriptoperator.java:354)
At Org.apache.hadoop.hive.ql.exec.Operator.process (operator.java:474)
At Org.apache.hadoop.hive.ql.exec.Operator.forward (operator.java:800)
At Org.apache.hadoop.hive.ql.exec.SelectOperator.processOp (selectoperator.java:84)
At Org.apache.hadoop.hive.ql.exec.Operator.process (operator.java:474)
At Org.apache.hadoop.hive.ql.exec.Operator.forward (operator.java:800)
At Org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp (extractoperator.java:45)
At Org.apache.hadoop.hive.ql.exec.Operator.process (operator.java:474)
At Org.apache.hadoop.hive.ql.exec.ExecReducer.reduce (execreducer.java:249)
... 7 more
caused By:java.io.IOException:Cannot Run Program "/usr/bin/python2.7": error=7, parameter list too long
At Java.lang.ProcessBuilder.start (processbuilder.java:1042)
At Org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp (scriptoperator.java:313)
... More
caused by:java.io.ioexception:error=7, parameter list too long
At Java.lang.UNIXProcess.forkAndExec (Native Method)
At Java.lang.UNIXProcess. (unixprocess.java:135)
At Java.lang.ProcessImpl.start (processimpl.java:130)
At Java.lang.ProcessBuilder.start (processbuilder.java:1023)
... More
Failed:execution Error, return code 20000 from Org.apache.hadoop.hive.ql.exec.MapRedTask. Unable to initialize custom script.
Solution:
Upgrade the kernel or reduce the number of partitions https://issues.apache.org/jira/browse/HIVE-2372
6.runtime Error
Shell Code
Hive> Show tables;
Failed:error in Metadata:java.lang.RuntimeException:Unable to instantiate Org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.DDLTask
Troubleshoot the problem:
Shell Code
Hive-hiveconf Hive.root.logger=debug,console
Text Code
13/07/15 16:29:24 INFO hive.metastore:Trying to connect to Metastore with URI thrift://xxx.xxx.xxx.xxx:9083
13/07/15 16:29:24 WARN hive.metastore:Failed to connect to the Metastore Server ...
Org.apache.thrift.transport.TTransportException:java.net.ConnectException: Deny connection
。。。
Metaexception (message:could not connect to meta store using any of the URIs provided. Most recent failure:org.apache.thrift.transport.TTransportException:java.net.ConnectException: Deny connection
Try to connect to port 9083, netstat see that the port is not actually being monitored, the first reaction is that Hiveserver does not start properly. The view Hiveserver process is there, just listening on port 10000.
To view the Hive-site.xml configuration, the hive client connects to Port 9083, and Hiveserver listens by default 10000 to find the root cause of the problem
Workaround:
Shell Code
Hive--service Hiveserver-p 9083
or modify the Hive.metastore.uris part of $hive_home/conf/hive-site.xml
Change Port to 10000
This article is from the "Longan" blog, please be sure to keep this source http://xulongping.blog.51cto.com/5676840/1606450
Summary of issues encountered in hadoop+hive usage