Summary of issues encountered in hadoop+hive usage

Source: Internet
Author: User

How to troubleshoot problems

    • General error, view error output, follow keyword Google

    • Exception errors (such as Namenode, Datanode, inexplicably hung): View HADOOP ($HADOOP _home/logs) or hive logs


Hadoop error
1.datanode does not start properly
After adding Datanode, Datanode does not start normally, the process will somehow hang up, the view Namenode log shows as follows:

Text Code

2013-06-21 18:53:39,182 FATAL org.apache.hadoop.hdfs.statechange:block* NameSystem.getDatanode:Data node x.x.x.x : 50010 is attempting to report storage ID ds-1357535176-x.x.x.x-50010-1371808472808. Node y.y.y.y:50010 is expected to serve this storage.

Cause Analysis:
When you copy the Hadoop installation package, the data and TMP folders are included (see my Hadoop installation article) and the format is not successful Datanode
Workaround:

Shell Code

Rm-rf/data/hadoop/hadoop-1.1.2/data

Rm-rf/data/hadoop/hadoop-1.1.2/tmp

Hadoop Datanode-format

2. Safe Mode

Text Code

2013-06-20 10:35:43,758 ERROR org.apache.hadoop.security.UserGroupInformation:PriviledgedActionException as:hadoop Cause:org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot Renew lease for Dfsclient_hb_rs_ wdev1.corp.qihoo.net,60020,1371631589073. Name node is in safe mode.

Solution:

Shell Code

Hadoop Dfsadmin-safemode Leave

3. Connection exceptions

Text Code

2013-06-21 19:55:05,801 WARN Org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException:Call to homename/ x.x.x.x:9000 failed on local exception:java.io.EOFException

Possible causes:

    • Namenode Monitor 127.0.0.1:9000, not 0.0.0.0:9000 or extranet ip:9000

    • Iptables restrictions


Solution:

    • Check the/etc/hosts configuration so that the hostname is bound to a non-127.0.0.1 IP

    • Iptables Release Port



4. Namenode ID

Text Code

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException:Incompatible Namespaceids in/var/lib/ Hadoop-0.20/cache/hdfs/dfs/data:namenode Namespaceid = 240012870; Datanode Namespaceid = 1462711424.

Problem: Namespaceid on Namenode is inconsistent with Datanode on Namespaceid.

The cause of the problem: each time Namenode format re-creates a Namenodeid, and Tmp/dfs/data contains the Id,namenode format under the last format to empty the data under Namenode, However, the data under Datanode is not emptied, so the Namespaceid on the Namenode node is inconsistent with Namespaceid on the Datanode node. Failed to start.

Workaround: Refer to the URL http://blog.csdn.net/wh62592855/archive/2010/07/21/5752199.aspx gives two solutions, we use the first solution: that is:

(1) Stop the Cluster service

(2) Delete the data directory on the Datanode node of the problem, the data directory is the DFS.DATA.DIR directory configured in the Hdfs-site.xml file, and the one on this machine is the/var/lib/hadoop-0.20/cache/hdfs/ dfs/data/(Note: We performed this step on all the Datanode and Namenode nodes at that time. In case the deletion is unsuccessful, you can save a copy of the data directory first.

(3) formatting of Namenode.

(4) Restart the cluster.

Problem solving.
One side effect of this approach is that all data on the HDFs is lost. If there are important data stored in HDFs, it is not recommended that you try the second method in the URL provided.

5. Directory Permissions
Start-dfs.sh Execute Error-free, show start Datanode, no datanode after execution. Check the logs on the Datanode machine to show that the permissions for the Dfs.data.dir directory are incorrect:

Text Code

Expected:drwxr-xr-x,current:drwxrwxr-x

Workaround:
To view the directory configuration of the Dfs.data.dir, modify the permissions.

Hive Error
1.NoClassDefFoundError
Could not initialize class Java.lang.NoClassDefFoundError:Could not initialize class Org.apache.hadoop.hbase.io.HbaseObjectWritable
Add Protobuf-***.jar to the jars path

XML code

$HIVE _home/conf/hive-site.xml

Hive.aux.jars.path

file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/ hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/ guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/ Protobuf-java-2.4.0a.jar

2.hive Dynamic Partition exception
[Fatal Error] Operator fs_2 (id=2): Number of dynamic partitions exceeded Hive.exec.max.dynamic.partitions.pernode

Shell Code

hive> Set hive.exec.max.dynamic.partitions.pernode = 10000;

3.mapreduce process Hyper-memory limit--hadoop Java heap Space
Vim Mapred-site.xml Add:

XML code

Mapred-site.xml


Mapred.child.java.opts


-xmx2048m


Shell Code

# $HADOOP _home/conf/hadoop_env.sh

Export hadoop_heapsize=5000

Limit of 4.hive files
[Fatal Error] Total number of created files are 100086, which exceeds 100000

Shell Code

Hive> set hive.exec.max.created.files=655350;

5.metastore Connection Timeout

Text Code

Failed:semanticexception Org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException:Read Timed out

Solution:

Shell Code

Hive> set hive.metastore.client.socket.timeout=500;

6. java.io.ioexception:error=7, Argument list too long

Text Code


Task with the most failures (5):

-----

Task ID:

task_201306241630_0189_r_000009


Url:

http://namenode.godlovesdog.com:50030/taskdetails.jsp?jobid=job_201306241630_0189&tipid=task_201306241630_ 0189_r_000009

-----

Diagnostic Messages for this Task:

Java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:Hive Runtime Error while processing row (tag=0) {"Key": {"Reducesinkkey0": "164058872", "Reducesinkkey1": "Djh,s1", "Reducesinkkey2": "20130117170703", " Reducesinkkey3 ":" XXX "}," value ": {" _col0 ":" 1 "," _col1 ":" xxx "," _col2 ":" 20130117170703 "," _col3 ":" 164058872 "," _col4 " : "Xxx,s1"}, "Alias": 0}

At Org.apache.hadoop.hive.ql.exec.ExecReducer.reduce (execreducer.java:270)

At Org.apache.hadoop.mapred.ReduceTask.runOldReducer (reducetask.java:520)

At Org.apache.hadoop.mapred.ReduceTask.run (reducetask.java:421)

At Org.apache.hadoop.mapred.child$4.run (child.java:255)

At java.security.AccessController.doPrivileged (Native Method)

At Javax.security.auth.Subject.doAs (subject.java:415)

At Org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1149)

At Org.apache.hadoop.mapred.Child.main (child.java:249)

caused by:org.apache.hadoop.hive.ql.metadata.HiveException:Hive Runtime Error while processing row (tag=0) {"key": {" Reducesinkkey0 ":" 164058872 "," Reducesinkkey1 ":" Xxx,s1 "," Reducesinkkey2 ":" 20130117170703 "," Reducesinkkey3 ":" XXX " }, "value": {"_col0": "1", "_col1": "xxx", "_col2": "20130117170703", "_col3": "164058872", "_col4": "Djh,s1"}, "Alias": 0}

At Org.apache.hadoop.hive.ql.exec.ExecReducer.reduce (execreducer.java:258)

... 7 more

caused by:org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20000]: Unable to initialize custom script.

At Org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp (scriptoperator.java:354)

At Org.apache.hadoop.hive.ql.exec.Operator.process (operator.java:474)

At Org.apache.hadoop.hive.ql.exec.Operator.forward (operator.java:800)

At Org.apache.hadoop.hive.ql.exec.SelectOperator.processOp (selectoperator.java:84)

At Org.apache.hadoop.hive.ql.exec.Operator.process (operator.java:474)

At Org.apache.hadoop.hive.ql.exec.Operator.forward (operator.java:800)

At Org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp (extractoperator.java:45)

At Org.apache.hadoop.hive.ql.exec.Operator.process (operator.java:474)

At Org.apache.hadoop.hive.ql.exec.ExecReducer.reduce (execreducer.java:249)

... 7 more

caused By:java.io.IOException:Cannot Run Program "/usr/bin/python2.7": error=7, parameter list too long

At Java.lang.ProcessBuilder.start (processbuilder.java:1042)

At Org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp (scriptoperator.java:313)

... More

caused by:java.io.ioexception:error=7, parameter list too long

At Java.lang.UNIXProcess.forkAndExec (Native Method)

At Java.lang.UNIXProcess. (unixprocess.java:135)

At Java.lang.ProcessImpl.start (processimpl.java:130)

At Java.lang.ProcessBuilder.start (processbuilder.java:1023)

... More



Failed:execution Error, return code 20000 from Org.apache.hadoop.hive.ql.exec.MapRedTask. Unable to initialize custom script.

Solution:
Upgrade the kernel or reduce the number of partitions https://issues.apache.org/jira/browse/HIVE-2372
6.runtime Error

Shell Code

Hive> Show tables;

Failed:error in Metadata:java.lang.RuntimeException:Unable to instantiate Org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.DDLTask

Troubleshoot the problem:

Shell Code

Hive-hiveconf Hive.root.logger=debug,console

Text Code

13/07/15 16:29:24 INFO hive.metastore:Trying to connect to Metastore with URI thrift://xxx.xxx.xxx.xxx:9083

13/07/15 16:29:24 WARN hive.metastore:Failed to connect to the Metastore Server ...

Org.apache.thrift.transport.TTransportException:java.net.ConnectException: Deny connection

。。。

Metaexception (message:could not connect to meta store using any of the URIs provided. Most recent failure:org.apache.thrift.transport.TTransportException:java.net.ConnectException: Deny connection

Try to connect to port 9083, netstat see that the port is not actually being monitored, the first reaction is that Hiveserver does not start properly. The view Hiveserver process is there, just listening on port 10000.
To view the Hive-site.xml configuration, the hive client connects to Port 9083, and Hiveserver listens by default 10000 to find the root cause of the problem
Workaround:

Shell Code

Hive--service Hiveserver-p 9083

or modify the Hive.metastore.uris part of $hive_home/conf/hive-site.xml

Change Port to 10000


This article is from the "Longan" blog, please be sure to keep this source http://xulongping.blog.51cto.com/5676840/1606450

Summary of issues encountered in hadoop+hive usage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.