International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Hive uses Python script to cause java.io.IOException:Broken pipe to exit unexpectedly

Last Update:2014-10-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

anti-spam Rd There is a hql, in the execution of the error exits, reported Java.io.IOException:Broken pipe exception, HQL used to Python script, hql and Python script recently no one changed, It was normal at number 10.1th, but the same error was always encountered after number 10.4th, and the error occurred in the stage-2 phase, with the following error message on the gateway:

2014-10-10 15:05:32,724 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_201406171104_4019895 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Jobtracker Page Job error message:

2014-10-10 15:00:29,614 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"1000390355","reducesinkkey1":"14"},"value":{"_col0":"1000390355","_col1":25,"_col2":"Infinity","_col3":"14","_col4":17},"alias":0}
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:419)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1061)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"1000390355","reducesinkkey1":"14"},"value":{"_col0":"1000390355","_col1":25,"_col2":"Infinity","_col3":"14","_col4":17},"alias":0}
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
	... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Broken pipe
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:348)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
	... 7 more
Caused by: java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:260)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:43)
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:331)
	... 15 more

STDERR logs:

Traceback (most recent call last):
  File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 86, in <module>
    pranalysis(cols[0],pr,cols[1],cols[4],prnum)
  File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 60, in pranalysis
    print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank)
TypeError: %d format: a number is required, not float

from the above job error information preliminary judgment, the problem should be 10.1 after the data problems, resulting in the Python script execution time to exit, the traffic channel is closed, and The Execreducer.reduce () method does not know that the channel that writes data to Python has been closed because of an exception and continues to write data to it, Java.io.IOException:Broken pipe exception occurs.

Here is the analysis process:

1, HQL and Python

HQL content is as follows:

add file /usr/home/wbdata_anti/shell/sass_offline/pranalysis.py;
select transform(BS.*) using 'pranalysis.py' as uid,prvalue,trend,prlevel
from
(
select B1.uid,B1.flws,B1.pr,iter,B2.alivefans from tmp_anti_user_pagerank1 B1
join
mds_anti_user_flwpr B2
on B1.uid=B2.uid
where iter>'00' and iter<='14' and dt='lowrlfans20141001'
distribute by uid sort by uid,iter
)BS;

The Python script reads as follows:

#!/usr/bin/python
#coding=utf-8
import sys,time
import re,math
from optparse import OptionParser
import ConfigParser

reload(sys)
sys.setdefaultencoding('utf-8')

parser = OptionParser(usage="usage:%prog [optinos] filepath")
parser.add_option("-i", "--iter",action = "store",type = 'string', dest = "iter",  default = '14',
		help="how many iterators" )
(options, args) = parser.parse_args()

def pranalysis(uid,prs,flw,fans,prnum):
	tasc=tdesc=0

	try:
		v=[float(pr)*100000000000 for pr in prs]
		fans=int(fans)
		interval=fans/100
	except:
		#rst=sys.exc_info()
	        #sys.excepthook(rst[0],rst[1],rst[2])
		return
	for i in  range(1,prnum-1)	:
		if i==1:
			if v[i+1]-v[i]>interval and v>fans:	tasc += 1
			elif v[i]-v[i+1]>interval and v[i+1]<fans:	tdesc += 1
			continue
		if v[i+1]-v[i]>interval:	tasc += 1
		elif v[i]-v[i+1]>interval:	tdesc += 1

	# rank indicate the rate between pr and fans. higher rank(big number) mean more possible negative user
	rate=v[prnum-1]/fans
	rank=4
	if rate>3.0: rank=0
	elif rate>2.0: rank=1
	elif rate>1.3: rank=2
	elif rate>0.7: rank=3
	elif rate>0.5: rank=4
	elif rate>0.3: rank=5
	elif rate>0.2: rank=6
	else: rank=7

	# 0 for stable trend. 1 for round trend,  2, for positive user, 3 for negative user.
	type=0
	if tasc>0 and tdesc>0:
		type=1
	elif tasc>0:
		type=2
	elif tdesc>0:
		type=3
	else: 		# tdesc=0 and tasc=0
		type=0
	#if fans<60:
	#	type=0

	print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank)


#format	sort by uid, iter
#uid            follow        pr        iter        fans
#1642909335      919     0.00070398898   04      68399779

prnum=int(options.iter)+1
pr=[0]*prnum
idx=1
lastiter='00'
lastuid=''
for line in sys.stdin:
	line=line.rstrip('\n')
        cols=line.split('\t')
	if len(cols)<5: continue
	if cols[3]>options.iter or cols[3]=='00':	continue
	if cols[3]<=lastiter:
		print '%s\t%d\t%d\t%d'%(lastuid,2,0,7)
		pr=[0]*prnum
		idx=1
	lastiter=cols[3]
	lastuid=cols[0]
	pr[idx]=cols[2]
	idx+=1
	if cols[3]==options.iter:
		pranalysis(cols[0],pr,cols[1],cols[4],prnum)
		pr=[0]*prnum
		lastiter='00'
		idx=1

2, stage-2 reduce phase of the implementation plan:

   Reduce Operator Tree:
        Extract
          Select Operator
            expressions:
                  expr: _col0
                  type: string
                  expr: _col1
                  type: bigint
                  expr: _col2
                  type: string
                  expr: _col3
                  type: string
                  expr: _col4
                  type: bigint
            outputColumnNames: _col0, _col1, _col2, _col3, _col4
            Transform Operator
              command: pranalysis.py
              output info:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

According to the execution plan, it can be seen that the reduce phase of the stage-2 is very simple, that is, the data obtained in the map phase is calculated using the pranalysis.py script, converted from 5 columns to 4 columns, Python output when the data format requirements:

print '%s\t%d\t%d\t%d '% (Uid,v[14]-20,type,rank)

Depending on the results of the execution plan, the stderr logs information in conjunction with the job:

Traceback (most recent call last):
  File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 86, in <module>
    pranalysis(cols[0],pr,cols[1],cols[4],prnum)
  File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 60, in pranalysis
    print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank)
TypeError: %d format: a number is required, not float

As can be seen, HQL is actually in the execution of Python due to data anomalies, python after the completion of a data format is float type, and we expected the data format should be number type, causing the Python script to exit abnormally, The data flow channel was closed when exiting, butThe execreducer.reduce () method actually does not know that the channel to write data to Python has been closed because of the exception, but also continue to write data, then there is a java.io.IOException:Broken pipe exception.

Reference:

http://fgh2011.iteye.com/blog/1684544

http://blog.csdn.net/churylin/article/details/11969925

Hive uses Python script to cause java.io.IOException:Broken pipe to exit unexpectedly

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hive uses Python script to cause java.io.IOException:Broken pipe to exit unexpectedly

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support