Hive0.13 rows loaded is empty problem source code analysis and fix

Source: Internet
Author: User
Tags getstat rowcount


After hive0.13 is upgraded, the information about rows loaded is lost after the job is run.

The information of rows loaded is output by printrowcount of hivehistory class in hive0.11. The hivehistory class is mainly used to record job running information, including the counter of a task. The default directory is/tmp/$ user.

Hive0.11 the hivehistory object is initialized in the Start method of sessionstate.

 if (startSs. hiveHist == null) {      startSs. hiveHist = new HiveHistory(startSs);    }

In hive0.13, hivehistory is an abstract class. Its implementation is in the hivehistoryimpl class. When the hivehistoryimpl object is initialized, a layer of judgment is added to determine hive. session. history. enabled settings (the default value is false), so hivehistoryimpl class is not instantiated.

  if(startSs.hiveHist == null){      if (startSs.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SESSION_HISTORY_ENABLED)) {        startSs.hiveHist = new HiveHistoryImpl (startSs);      }else {        //Hive history is disabled, create a no-op proxy        startSs.hiveHist = HiveHistoryProxyHandler .getNoOpHiveHistoryProxy();      }    }

After the configuration is fixed, no information about rows loaded is found.

The printrowcount method is implemented as follows:

Public void printrowcount (string queryid) {queryinfo Ji = queryinfomap. get (queryid); If (JI = NULL) {// if Ji is empty, return directly;} For (string tab: Ji. rowcountmap. keyset () {console. printinfo (JI. rowcountmap. get (Tab) + "rows loaded to" + TAB); // obtain data from hashmap }}

In hive0.13, the obtained Ji object is null.

In the last step, it is found that the counter does not have table_id _ (\ D +) _ rowcount, so it cannot match the row_count_pattern regular. The row Count value cannot be obtained normally.

The getrowcounttablename method for obtaining the rows loaded information of Tasker count is as follows:

Private Static final string row_count_pattern = "table_id _ (\ D +) _ rowcount"; Private Static final pattern rowcountpattern = pattern. compile (row_count_pattern );...... string getrowcounttablename (string name) {If (idtotablemap = NULL) {return NULL;} matcher M = rowcountpattern. matcher (name); If (M. find () {// No counter matches table_id_xxxx, that is, the counter does not print table_id _ (\ D +) _ rowcount .. String tuple = M. Group (1); Return idtotablemap. Get (tuple);} return NULL ;}

Table_id _ (\ D +) _ rowcount is written by the filesinkoperator class. The related code in hive0.11 is as follows:

  protected void initializeOp(Configuration hconf) throws HiveException {..........      int id = conf.getDestTableId();      if ((id != 0) && (id <= TableIdEnum. values().length)) {        String enumName = "TABLE_ID_" + String.valueOf(id) + "_ROWCOUNT";        tabIdEnum = TableIdEnum.valueOf(enumName);        row_count = new LongWritable();        statsMap.put( tabIdEnum, row_count );      }

In hive0.13, this part of the code is removed, and the fix is relatively simple. You can simply add this counter back.

The patch is as follows:

diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java b/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.javaindex 1dde78e..96860f7 100644--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java@@ -68,13 +68,16 @@import org.apache.hadoop.util.ReflectionUtils;import com.google.common.collect.Lists;+import org.apache.commons.logging.Log;+import org.apache.commons.logging.LogFactory;+/**  * File Sink operator implementation.  **/public class FileSinkOperator extends TerminalOperator<FileSinkDesc> implements     Serializable {-+  public static Log LOG = LogFactory.getLog("FileSinkOperator.class");   protected transient HashMap<String, FSPaths> valToPaths;   protected transient int numDynParts;   protected transient List<String> dpColNames;@@ -214,6 +217,7 @@ public Stat getStat() {   protected transient FileSystem fs;   protected transient Serializer serializer;   protected transient LongWritable row_count;+  protected transient TableIdEnum tabIdEnum = null;   private transient boolean isNativeTable = true;   /**@@ -241,6 +245,23 @@ public Stat getStat() {   protected transient JobConf jc;   Class<? extends Writable> outputClass;   String taskId;+  public static enum TableIdEnum {+       TABLE_ID_1_ROWCOUNT,+       TABLE_ID_2_ROWCOUNT,+       TABLE_ID_3_ROWCOUNT,+       TABLE_ID_4_ROWCOUNT,+       TABLE_ID_5_ROWCOUNT,+       TABLE_ID_6_ROWCOUNT,+       TABLE_ID_7_ROWCOUNT,+       TABLE_ID_8_ROWCOUNT,+       TABLE_ID_9_ROWCOUNT,+       TABLE_ID_10_ROWCOUNT,+       TABLE_ID_11_ROWCOUNT,+       TABLE_ID_12_ROWCOUNT,+       TABLE_ID_13_ROWCOUNT,+       TABLE_ID_14_ROWCOUNT,+       TABLE_ID_15_ROWCOUNT;+  }   protected boolean filesCreated = false;@@ -317,7 +338,15 @@ protected void initializeOp(Configuration hconf) throws HiveException {         prtner = (HivePartitioner<HiveKey, Object>) ReflectionUtils.newInstance(             jc.getPartitionerClass(), null);       }-      row_count = new LongWritable();+      //row_count = new LongWritable();+         int id = conf.getDestTableId();+         if ((id != 0) && (id <= TableIdEnum.values().length)) {+               String enumName = "TABLE_ID_" + String.valueOf(id) + "_ROWCOUNT";  +               tabIdEnum = TableIdEnum.valueOf(enumName);+               row_count = new LongWritable();+               statsMap.put(tabIdEnum, row_count);+         }+       if (dpCtx != null) {         dpSetup();       }

After the patch is completed, re-package, replace the online hive-exec-xxx.jar package and test, rows loaded data is back.

This article from the "Food light blog" blog, please be sure to keep this source http://caiguangguang.blog.51cto.com/1652935/1528516

Hive0.13 rows loaded is empty problem source code analysis and fix

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.