Hive queries using Semanticanalyzerhook filter without partitioning criteria

Source: Internet
Author: User
Tags filter final gettext table name

Nearly a 80 of our jobs in the Hadoop cluster are submitted through hive, and because hive is simple to write and we provide hive Web Client, it's a very wide range of uses, including Ba,pm,po, Sales are using hive for Ad-hoc query, but hive in reducing user access to the threshold at the same time, also makes users often write unreasonable expensive statements, generate a lot of mapreduce job, occupy a large number of slot, the most typical example is the partition table query, does not specify partition conditions, resulting in hive not doing partition pruner optimizations, reading all of the table data, consuming large amounts of IO and computing resources.

In order to avoid this situation as far as possible, we can use the hive hook mechanism, in the hook to implement a number of methods to do the sentence, the first phase will not directly block the statement, but the document has a problem statement to announce the warning.

The practice is to implement the Hivesemanticanalyzerhook interface, the Preanalyze method and the Postanalyze method will be executed before and after the compile function, as long as we implement the Preanalyze method, Iterate through the Astnode abstract syntax tree, get the left subtree of the from table name and right subtree where to judge the condition key value, if the from table is a partitioned table, the Metastore client gets all its partition key names, If any one of the partition keys appears in the user-specified where condition, the statement is instrumented, otherwise a warning is exported in the standard error, and the user name and execution statement are logged in the background log, which will be used at regular intervals to Case in Hive-user group mailbox for publicity, hope can through this way to play a mutual warning and learning effect.

The compile function reflects the instantiated hook class based on the hive.semantic.analyzer.hook specified in hiveconf, where Abstractsemanticanalyzerhook is implemented Dpsemanticanalyzerhook

Package org.apache.hadoop.hive.ql.parse;  
Import java.io.Serializable;  
Import java.util.ArrayList;  
      
Import java.util.List;  
Import Org.apache.commons.lang.StringUtils;  
Import Org.apache.hadoop.hive.metastore.api.FieldSchema;  
Import Org.apache.hadoop.hive.ql.exec.Task;  
Import org.apache.hadoop.hive.ql.metadata.Hive;  
Import org.apache.hadoop.hive.ql.metadata.HiveException;  
Import org.apache.hadoop.hive.ql.metadata.Table;  
Import Org.apache.hadoop.hive.ql.session.SessionState;  
Import Org.apache.hadoop.hive.ql.session.SessionState.LogHelper;  
      
Import Org.apache.hadoop.hive.shims.ShimLoader; public class Dpsemanticanalyzerhook extends Abstractsemanticanalyzerhook {private final static String No_partition_wa rning = "WARNING:HQL isn't efficient, please specify Partition condition! hql:%s;  
      
  Username:%s ";  
  Private final sessionstate SS = Sessionstate.get ();  
  Private final Loghelper console = Sessionstate.getconsole (); Private Hive hive = null;  
  Private String username;  
  Private String currentdatabase = "Default";  
  Private String hql;  
  Private String wherehql;  
  Private String Tablealias;  
  Private String TableName;  
  Private String Tabledatabasename;  
      
  Private Boolean needcheckpartition = false; @Override public Astnode Preanalyze (hivesemanticanalyzerhookcontext context, Astnode ast) throws Semanticexcepti  
      On {try {hql = Ss.getcmd (). toLowerCase ();  
      HQL = Stringutils.replacechars (hql, ' \ n ', ');  
      if (Hql.contains ("where")) {wherehql = hql.substring (Hql.indexof ("where"));  
      
      } username = Shimloader.gethadoopshims (). GetUserName (context.getconf ());  
          if (Ast.gettoken (). GetType () = = Hiveparser.tok_query) {try {hive = context.gethive ();  
        CurrentDatabase = Hive.getcurrentdatabase ();  
        catch (Hiveexception e) {throw new Semanticexception (e);  
}      
        Extractfromclause ((astnode) ast.getchild (0)); if (needcheckpartition &&!  
              Stringutils.isblank (tablename)) {String dbname = Stringutils.isempty (tabledatabasename)? currentdatabase  
          : Tabledatabasename;  
          String tbname = tablename;  
          string[] Parts = Tablename.split (".");  
            if (parts.length = = 2) {dbname = parts[0];  
          Tbname = parts[1];  
          } Table t = hive.gettable (dbname, tbname); if (t.ispartitioned ()) {if (Stringutils.isblank (WHEREHQL)) {Console.printerror (string.form  
            At (no_partition_warning, HQL, username));  
              else {list<fieldschema> Partitionkeys = T.getpartitionkeys ();  
              list<string> partitionnames = new arraylist<string> (); for (int i = 0; i < partitionkeys.size (); i++) {Partitionnames.add (PartitIonkeys.get (i). GetName (). toLowerCase ()); } if (!containspartcond (Partitionnames, WHEREHQL, Tablealias)) {Console.printerr  
              or (String.Format (no_partition_warning, HQL, username));  
    catch (Exception ex) {Ex.printstacktrace ()}}}}}  
  } return AST; Private Boolean Containspartcond (list<string> partitionkeys, String sql, string alias) {for (Str  
      ing Pk:partitionkeys {if (Sql.contains (PK)) {return true; } if (!  
      Stringutils.isempty (alias) && Sql.contains (alias + "." + pk)) {return true;  
  return false;  
      private void Extractfromclause (Astnode ast) {if Hiveparser.tok_from = = Ast.gettoken (). GetType ()) {  
      Astnode Refnode = (astnode) ast.getchild (0); if (Refnode.gettoken (). GetType () = = Hiveparser.tok_taBref && ast.getchildcount () = = 1) {Astnode Tabnamenode = (Astnode) (Refnode.getchild (0));  
        int refnodechildcount = Refnode.getchildcount ();  
            if (Tabnamenode.gettoken (). GetType () = = Hiveparser.tok_tabname) {if (Tabnamenode.getchildcount () = 2) {  
            Tabledatabasename = Tabnamenode.getchild (0). GetText (). toLowerCase ();  
          TableName = Basesemanticanalyzer.getunescapedname (Astnode) tabnamenode.getchild (1)). toLowerCase (); else if (tabnamenode.getchildcount () = = 1) {tablename = Basesemanticanalyzer.getunescapedname (  
          Astnode) tabnamenode.getchild (0)). toLowerCase ();  
          else {return; } if (Refnodechildcount = = 2) {Tablealias = Basesemanticanalyzer.unescapeidentifier (refn  
          Ode.getchild (1). GetText ()). toLowerCase ();  
   } needcheckpartition = true;     @Override public void Postanalyze (the hivesemanticanalyzerhookcontext context, list<task<?   
    Extends Serializable>> roottasks) throws Semanticexception {//Loghelper console = Sessionstate.getconsole ();  
    set<readentity> Readentitys = Context.getinputs ();  
    Console.printinfo ("Total Read Entity Size:" + readentitys.size ());  
    for (readentity readentity:readentitys) {//Partition P = readentity.getpartition ();  
    Table t = readentity.gettable (); // }  
  }  
}

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.