After hive0.13 is upgraded, the information about rows loaded is lost after the job is run.
The information of rows loaded is output by printrowcount of hivehistory class in hive0.11. The hivehistory class is mainly used to record job running information, including the counter of a task. The default directory is/tmp/$ user.
Hive0.11 the hivehistory object is initialized in the Start method of sessionstate.
if (startSs. hiveHist == null) { startSs. hiveHist = new HiveHistory(startSs); }
In hive0.13, hivehistory is an abstract class. Its implementation is in the hivehistoryimpl class. When the hivehistoryimpl object is initialized, a layer of judgment is added to determine hive. session. history. enabled settings (the default value is false), so hivehistoryimpl class is not instantiated.
if(startSs.hiveHist == null){ if (startSs.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SESSION_HISTORY_ENABLED)) { startSs.hiveHist = new HiveHistoryImpl (startSs); }else { //Hive history is disabled, create a no-op proxy startSs.hiveHist = HiveHistoryProxyHandler .getNoOpHiveHistoryProxy(); } }
After the configuration is fixed, no information about rows loaded is found.
The printrowcount method is implemented as follows:
Public void printrowcount (string queryid) {queryinfo Ji = queryinfomap. get (queryid); If (JI = NULL) {// if Ji is empty, return directly;} For (string tab: Ji. rowcountmap. keyset () {console. printinfo (JI. rowcountmap. get (Tab) + "rows loaded to" + TAB); // obtain data from hashmap }}
In hive0.13, the obtained Ji object is null.
In the last step, it is found that the counter does not have table_id _ (\ D +) _ rowcount, so it cannot match the row_count_pattern regular. The row Count value cannot be obtained normally.
The getrowcounttablename method for obtaining the rows loaded information of Tasker count is as follows:
Private Static final string row_count_pattern = "table_id _ (\ D +) _ rowcount"; Private Static final pattern rowcountpattern = pattern. compile (row_count_pattern );...... string getrowcounttablename (string name) {If (idtotablemap = NULL) {return NULL;} matcher M = rowcountpattern. matcher (name); If (M. find () {// No counter matches table_id_xxxx, that is, the counter does not print table_id _ (\ D +) _ rowcount .. String tuple = M. Group (1); Return idtotablemap. Get (tuple);} return NULL ;}
Table_id _ (\ D +) _ rowcount is written by the filesinkoperator class. The related code in hive0.11 is as follows:
protected void initializeOp(Configuration hconf) throws HiveException {.......... int id = conf.getDestTableId(); if ((id != 0) && (id <= TableIdEnum. values().length)) { String enumName = "TABLE_ID_" + String.valueOf(id) + "_ROWCOUNT"; tabIdEnum = TableIdEnum.valueOf(enumName); row_count = new LongWritable(); statsMap.put( tabIdEnum, row_count ); }
In hive0.13, this part of the code is removed, and the fix is relatively simple. You can simply add this counter back.
The patch is as follows:
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java b/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.javaindex 1dde78e..96860f7 100644--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java@@ -68,13 +68,16 @@import org.apache.hadoop.util.ReflectionUtils;import com.google.common.collect.Lists;+import org.apache.commons.logging.Log;+import org.apache.commons.logging.LogFactory;+/** * File Sink operator implementation. **/public class FileSinkOperator extends TerminalOperator<FileSinkDesc> implements Serializable {-+ public static Log LOG = LogFactory.getLog("FileSinkOperator.class"); protected transient HashMap<String, FSPaths> valToPaths; protected transient int numDynParts; protected transient List<String> dpColNames;@@ -214,6 +217,7 @@ public Stat getStat() { protected transient FileSystem fs; protected transient Serializer serializer; protected transient LongWritable row_count;+ protected transient TableIdEnum tabIdEnum = null; private transient boolean isNativeTable = true; /**@@ -241,6 +245,23 @@ public Stat getStat() { protected transient JobConf jc; Class<? extends Writable> outputClass; String taskId;+ public static enum TableIdEnum {+ TABLE_ID_1_ROWCOUNT,+ TABLE_ID_2_ROWCOUNT,+ TABLE_ID_3_ROWCOUNT,+ TABLE_ID_4_ROWCOUNT,+ TABLE_ID_5_ROWCOUNT,+ TABLE_ID_6_ROWCOUNT,+ TABLE_ID_7_ROWCOUNT,+ TABLE_ID_8_ROWCOUNT,+ TABLE_ID_9_ROWCOUNT,+ TABLE_ID_10_ROWCOUNT,+ TABLE_ID_11_ROWCOUNT,+ TABLE_ID_12_ROWCOUNT,+ TABLE_ID_13_ROWCOUNT,+ TABLE_ID_14_ROWCOUNT,+ TABLE_ID_15_ROWCOUNT;+ } protected boolean filesCreated = false;@@ -317,7 +338,15 @@ protected void initializeOp(Configuration hconf) throws HiveException { prtner = (HivePartitioner<HiveKey, Object>) ReflectionUtils.newInstance( jc.getPartitionerClass(), null); }- row_count = new LongWritable();+ //row_count = new LongWritable();+ int id = conf.getDestTableId();+ if ((id != 0) && (id <= TableIdEnum.values().length)) {+ String enumName = "TABLE_ID_" + String.valueOf(id) + "_ROWCOUNT"; + tabIdEnum = TableIdEnum.valueOf(enumName);+ row_count = new LongWritable();+ statsMap.put(tabIdEnum, row_count);+ }+ if (dpCtx != null) { dpSetup(); }
After the patch is completed, re-package, replace the online hive-exec-xxx.jar package and test, rows loaded data is back.
This article from the "Food light blog" blog, please be sure to keep this source http://caiguangguang.blog.51cto.com/1652935/1528516
Hive0.13 rows loaded is empty problem source code analysis and fix