There is a table in hive, the column delimiter is a colon (:), there is a column utime is timestamp format, you need to go to weekday to save to a new table.
Using Python to write a pipeline transform,weekday.py code is also simple: import sysimport datetimefor line in Sys.stdin:line=line.strip () UID , Mid,rating,utime=line.split (': ') Weekday=datetime.datetime.fromtimestamp (float (utime)). Isoweekday () print ' \ t '. Join ([UID,MID,RATING,STR (weekday)]) HQL query is also simple: Select Transform (uid,mid,rating,utime) using ' Python weekday.py ' as ( Uid,mid,rating,weekday) from rating
Stage-1 After the end of the error! Troubleshooting process:
1. The log given by hive has no meaning。 Hive log:INFO exec. task:2015-07-07 16:34:57,938 Stage-1 map = 0, reduce = 0%INFO exec. task:2015-07-07 16:35:30,262 Stage-1 map = 100%, reduce = 0%ERROR exec. task:ended Job = job_1431587697935_0210 with errors
ERROR operation. Operation:error running hive query:
org.apache.hive.service.cli.HiveSQLException:Error while processing statement:FAILED:Execution Error, return Code 20001 from Org.apache.hadoop.hive.ql.exec.mr.MapRedTask. An error occurred when reading or writing to your custom script. It may have the crashed with an error. At org.apache.hive.service.cli.operation.Operation.toSQLException (operation.java:315)At org.apache.hive.service.cli.operation.SQLOperation.runQuery (sqloperation.java:156)At org.apache.hive.service.cli.operation.sqloperation.access$100 (sqloperation.java:71)At Org.apache.hive.service.cli.operation.sqloperation$1$1.run (sqloperation.java:206)At java.security.AccessController.doPrivileged (Native Method)At javax.security.auth.Subject.doAs (subject.java:415)At org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1628)At Org.apache.hive.service.cli.operation.sqloperation$1.run (sqloperation.java:218)At Java.util.concurrent.executors$runnableadapter.call (executors.java:471)At Java.util.concurrent.FutureTask.run (futuretask.java:262)At Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1145)At Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:615)At Java.lang.Thread.run (thread.java:745)
2. Don't forget! Turn on the hive log for debug, and then look at the log. Because I've been using beeline to connect to hive and get a log with 1. No gains. Later, I thought, if you don't have to look at the hive CLI, there will be no gains. Finally got a bit of a meaningful log:Task with the most failures (4):
-----
Task ID:
task_1431587697935_0210_m_000000
-----
Diagnostic Messages for this Task:
Error:java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:Hive Runtime Error while Processing row {"UID": One, "mid": 2791, "rating": 4, "Utime": "978903186"}
At Org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map (execmapper.java:172)
At Org.apache.hadoop.mapred.MapRunner.run (maprunner.java:54)
At Org.apache.hadoop.mapred.MapTask.runOldMapper (maptask.java:450)
At Org.apache.hadoop.mapred.MapTask.run (maptask.java:343)
At Org.apache.hadoop.mapred.yarnchild$2.run (yarnchild.java:163)
At java.security.AccessController.doPrivileged (Native Method)
At javax.security.auth.Subject.doAs (subject.java:415)
At org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1628)
At Org.apache.hadoop.mapred.YarnChild.main (yarnchild.java:158)
caused by:org.apache.hadoop.hive.ql.metadata.HiveException:Hive Runtime Error while processing row {"UID": One, "mid ": 2791," rating ": 4," Utime ":" 978903186 "}
At org.apache.hadoop.hive.ql.exec.MapOperator.process (mapoperator.java:518)
At Org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map (execmapper.java:163)
... 8 More
caused by:org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An Error occurred while reading or writin G to your custom script. It may have the crashed with an error.
At org.apache.hadoop.hive.ql.exec.ScriptOperator.process (scriptoperator.java:456)
At Org.apache.hadoop.hive.ql.exec.Operator.forward (operator.java:837)
At org.apache.hadoop.hive.ql.exec.SelectOperator.process (selectoperator.java:88)
At Org.apache.hadoop.hive.ql.exec.Operator.forward (operator.java:837)
At org.apache.hadoop.hive.ql.exec.TableScanOperator.process (tablescanoperator.java:97)
At Org.apache.hadoop.hive.ql.exec.mapoperator$mapopctx.forward (mapoperator.java:162)
At org.apache.hadoop.hive.ql.exec.MapOperator.process (mapoperator.java:508)
caused By:java.io.IOException:Broken pipe
At java.io.FileOutputStream.writeBytes (Native Method)
At java.io.FileOutputStream.write (fileoutputstream.java:345)
At java.io.BufferedOutputStream.write (bufferedoutputstream.java:122)
At Java.io.BufferedOutputStream.flushBuffer (bufferedoutputstream.java:82)
At java.io.BufferedOutputStream.write (bufferedoutputstream.java:126)
At java.io.DataOutputStream.write (dataoutputstream.java:107)
At org.apache.hadoop.hive.ql.exec.TextRecordWriter.write (textrecordwriter.java:53)
At org.apache.hadoop.hive.ql.exec.ScriptOperator.process (scriptoperator.java:425)
3. According to the row of suspicious data in the previous step, I suspect bad data, processing error。 I put the error line in the other table alone, and there is no problem at all. All right, go ahead.
4. I need to be clear before the interruption: What is Java.io.IOException:Broken pipe? When the writing end appears, the other end breaks or exits, so the data in the pipeline is not taken out in time, and the system exits unexpectedly. Here is: When streaming is getting input data, to weekday.py processing, the weekday.py exception terminates. When streaming ready data back, but can not find weekday.py to receive data, so broken pipe.
5. Find out what
broken pipe and give the previous error message that the problem should be on the weekday.py。 Next, since it was a mapreduce error, it was necessary to look at yarn's stderr. View stderr in logs corresponding to application by Resoucemanager, found: Traceback (most recent call last): File "weekday_mapper.py", line 5, in <module> uid,mid,rating,utime=line.split (': ') Valueerror:need + than 1 value to unpack
6. From the python error, it is inferred that there is a delimiter for the data row (:) has an exception, resulting in a 4 value (Uid,mid,rating,utime) that cannot be returned after split. Check the data format in a variety of ways, everything is fine. Then, handle the script plus exception handling. Plus exception handling after the error,
but select outputs 0 rows of data。 Import Sysimport datetimefor line in Sys.stdin:try:line=line.strip () uid,mid,rating,utime=line.split (': ') weekday=dat Etime.datetime.fromtimestamp (float (utime)). Isoweekday () print ' \ t '. Join ([UID,MID,RATING,STR (weekday)]) except Exception, Ex:pass
7. Problem locked To: script processing data problem。 It is normal to try to fetch the data file of the table directly from HDFs and then use the script to process it. HDFs Dfs-cat/user/hive/warehouse/test.db/t/000000_0|python/tmp/weekday_ Mapper.py Finally, it is doubtful if the output format of transform is different from the format of the definition table, check the official note:
By default, columns is transformed to STRING and delimited by TAB before feeding to the user script.
The Uid,mid,rating,utime=line.split (':') in the script is then changed to Uid,mid,rating,utime=line.split ('\ t'). Try again, success!
Summarize
1. It took 2 days or so to troubleshoot the problem. Write this summary, in fact, my heart is broken, Heart has 10,000 deer ....
2. Basic knowledge is very important, in order to become a system in their own heart, can be enough. The road is endless!
3. Sometimes the experience of "guessing", will be very helpful, and sometimes "smart and smart to be mistaken." So we should pay attention to the log, and use it as the basis for the operation reproduction.
Broken pipe Error Troubleshooting when implementing transform with Python in hive