Hive SQL Compilation process

Source: Internet
Author: User
Tags lexer shuffle

Transferred from: http://www.open-open.com/lib/view/open1400644430159.html

Hive and Impala seem to be the company or the research system commonly used, the former more stable point, the implementation of the way is mapreduce, because when using hue, in the GroupBy Chinese, there are some problems, and see write Long SQL statements, often see a lot of job, So you want to know how the next hive translates SQL into a mapreduce job. When you write SQL later, you probably know how to do the optimization. Here is an excellent article to see (The tech blog of the American Regiment), and I glued it over to mark:

Hive is a data warehouse system based on Hadoop, which is widely used in major companies. The U.S. mission Data Warehouse is also based on Hive, performing nearly every day of the hive ETL calculation process, responsible for hundreds of GB of data storage and analysis every day. The stability and performance of hive is critical to our data analysis.

In the course of several upgrades to hive, we encountered a number of big and small problems. By consulting our community and our own efforts, we have a deep understanding of how hive compiles SQL into MapReduce while addressing these issues. Understanding this process not only helps us solve some of the hive bugs, it also helps us optimize hive SQL, improve our control over hive, and be able to customize some of the features we need.

1. How mapreduce implements basic SQL operations

Before we explain SQL to MapReduce, let's take a look at how the MapReduce framework implements SQL basic operations.

The implementation principle of 1.1 join
Select U.name, o.orderid from order o join user u on o.uid = U.uid;

Tag the data of different tables in the output value of map, and judge the data source according to tag in the reduce phase. The process of MapReduce is as follows (this is just the implementation of the most basic join, as well as other implementations)

1.2 Group by's implementation principle
Select Rank, Isonline, COUNT (*) from City Group by rank, Isonline;

The GroupBy field is combined with the output key value of the map, and the Lastkey is distinguished by the use of the mapreduce ordering to save the different keys in the reduce phase. The process of MapReduce is as follows (this is, of course, the non-hash aggregation process of the reduce).

The realization principle of 1.3 distinct
Select Dealid, COUNT (distinct UID) num from order group by Dealid;

When there is only one distinct field, if you do not take into account the hash GroupBy of the map stage, simply combine the GroupBy field and the distinct field into the map output key, using the sort of mapreduce and GroupBy field as To reduce the key, save Lastkey in the reduce phase to complete the de-

If there are multiple distinct fields, such as the following SQL

Select Dealid, COUNT (distinct uid), COUNT (distinct date) from the order group by Dealid;

There are two ways of implementing this:

(1) If you still follow the method of the above distinct field, that is, this implementation, can not be sorted according to the UID and date, and can not be lastkey by the weight, still need in the reduce phase in memory by hash to weight

(2) The second implementation method, can be all distinct field number, each row of data to generate n rows of data, then the same field will be sorted separately, only need to record lastkey in the reduce phase can be heavy.

This implementation is a good use of the mapreduce sequencing, saving the reduce phase deduplication memory consumption, but the disadvantage is to increase the amount of shuffle data.

It is important to note that when you generate reduce value, the value of the remaining distinct data rows can be empty except for the row in which the first distinct field is left.

2. The process of converting SQL to MapReduce

After learning about the basic SQL operations of MapReduce, let's look at how hive transforms SQL into a MapReduce task, and the entire compilation process is divided into six phases:

    1. ANTLR defines SQL syntax rules, completes SQL lexical, parses syntax, transforms SQL into abstract syntax trees ast tree
    2. Iterate through the AST Tree and abstract out the basic constituent unit of the query Queryblock
    3. Traverse Queryblock, translate to execute action tree Operatortree
    4. The logic Layer optimizer makes Operatortree transformations, merges unnecessary reducesinkoperator, and reduces the amount of shuffle data
    5. Traverse Operatortree, translate to MapReduce task
    6. The physical layer optimizer transforms the MapReduce task to generate the final execution plan

These six stages are described below

2.1 Phase1 SQL lexical, parsing 2.1.1 ANTLR

Hive uses ANTLR to implement lexical and syntactic parsing of SQL. ANTLR is a language recognition tool that can be used to construct domain languages.
ANTLR is not described in detail here, only need to understand the use of ANTLR constructs a specific language only need to write a grammar file, define lexical and grammatical substitution rules, ANTLR complete lexical analysis, grammar analysis, semantic analysis, intermediate code generation process.

The definition file for grammar rules in Hive is hive.g a file before version 0.10, and as the syntax rules become more complex, the Java parsing class generated by the grammar rules may exceed the maximum limit of the Java class file. Version 0.11 splits hive.g into 5 files, Word statutes hivelexer.g and grammar rules 4 files Selectclauseparser.g,fromclauseparser.g,identifiersparser.g,hiveparser.g.

2.1.2 Abstract Syntax Trees ast tree

After lexical and syntactic parsing, if you need to do further processing of the expression, use ANTLR abstract syntax tree syntax to abstract Syntax tree, while parsing the input statements into an abstract syntax tree, and subsequent traversal of the syntax tree to complete further processing.

The following syntax is a syntax rule for selectstatement in hive SQL, which shows that selectstatement contains Select, from, where, GroupBy, have, order-by clauses.
(in the following syntax rule, the arrows indicate a rewrite of the original statement, which will include some special words to mark a particular syntax, such as tok_query to indicate a query block)

Selectstatement   :   selectclause   fromclause   whereclause?   Groupbyclause?   Havingclause?   Orderbyclause?   Clusterbyclause?   Distributebyclause?   Sortbyclause?   Limitclause? ^ (Tok_query fromclause ^ (tok_insert ^ (tok_destination ^ (tok_dir tok_tmp_file))                     selectclause whereClause? Groupbyclause? Havingclause? Orderbyclause? Clusterbyclause?                     Distributebyclause? Sortbyclause? Limitclause?))   ;
2.1.3 Sample SQL

To detail the process of SQL translation to MapReduce, here is a simple SQL example where SQL contains a subquery that eventually writes data to a table

From (   SELECT    p.datekey datekey,    p.userid userid,    c.clienttype  from    detail.usersequence_ Client C    Join fact.orderpayment p p.orderid = C.orderid    join Default.user du on du.userid = P.userid  WHERE P.datekey = 20131118) baseinsert OVERWRITE TABLE ' test '. ' Customer_kpi ' SELECT  base.datekey,  Base.clienttype,  count (distinct Base.userid) Buyer_countgroup by Base.datekey, Base.clienttype
2.1.3 SQL Generation AST Tree

ANTLR the code for the hive SQL parsing is as follows, Hivelexerx,hiveparser is ANTLR syntax file hive.g compiled automatically after the lexical parsing and parsing class, in these two classes for complex parsing.

Hivelexerx lexer = new Hivelexerx (new Antlrnocasestringstream (command));    Lexical parsing, ignoring the case of keywords tokenrewritestream tokens = new Tokenrewritestream (lexer); if (CTX! = null) {  Ctx.settokenrewritestream (tokens);} Hiveparser parser = new Hiveparser (tokens);                                 Syntax parsing parser.settreeadaptor (adaptor); Hiveparser.statement_return r = null;try {  r = parser.statement ();                                                   Convert to AST Tree} catch (Recognitionexception e) {  e.printstacktrace ();  throw new ParseException (parser.errors);}

The resulting AST tree is like the right side (built using ANTLR works, ANTLR Works is the editor that ANTLR provides for writing grammar files), and only a few nodes of the skeleton are expanded, not fully expanded.
Sub-query 1/2, corresponding to the right of the 1th/22 parts respectively.

Note here that the inner subquery also generates a Tok_destination node. Take a look at the syntax of the above selectstatement, a node that was deliberately added to the syntax rewrite. The reason is that the data for all queries in hive are stored in the HDFs temporary file, whether it's an intermediate subquery or the final result of the query, and the INSERT statement will eventually write the data to the HDFS directory where the table resides.

In detail, after the FROM clause of the memory subquery is expanded, it gets the following AST Tree, each of which generates a TOK_TABREF node, and the join condition generates a "=" node. Other SQL sections are similar, not detailed.

2.2 Phase2 SQL Basic composition Unit Queryblock

The AST tree is still very complex, unstructured, and inconvenient to translate directly into a mapreduce program, where the AST tree is transformed into an abstraction and queryblock of SQL.

2.2.1 Queryblock

Queryblock is one of the most basic elements of SQL, including three parts: input source, calculation process, output. Simply put, a queryblock is a subquery.

A class diagram of Queryblock related objects in hive, explaining several important attributes in the diagram

    • QB#ALIASTOSUBQ (Represents the Aliastosubq property of the QB Class) holds the QB object for the subquery, ALIASTOSUBQ key value is the alias of the subquery
    • QB#QBP is qbparseinfo the AST tree structure in which a basic SQL unit is stored, qbparseinfo#nametodest this hashmap to save the output of the query unit, The form of key is inclause-i (because hive supports multi INSERT statements, so there may be multiple outputs), and value is the corresponding Astnode node, the Tok_destination node. Class Qbparseinfo the remaining HashMap properties to save the corresponding relationship of the output and the Astnode nodes of each operation, respectively.
    • qbparseinfo#joinexpr Save the Tok_join node. Qb#qbjointree is the structure of the join syntax tree.
    • QB#QBM saves meta-information for each input table, such as the path to the table on HDFs, the file format for saving the table data, and so on.
    • Qbexpr This object is meant to represent the Union operation.

2.2.2 AST Tree Build Queryblock

AST Tree's process of generating queryblock is a recursive process that iterates through the AST tree, encounters different token nodes, and saves them to the corresponding attributes, mainly including the following processes

    • Tok_query = Create QB object, loop recursive child nodes
    • Tok_from = Save the table name syntax section to the Aliastotabs properties of the QB object
    • Tok_insert = Loop Recursive child node
    • Tok_destination = Saves the syntax portion of the output target in the Nametodest property of the Qbparseinfo object
    • Tok_select = Saves the syntax portion of the query expression in desttoselexpr, Desttoaggregationexprs, Desttodistinctfuncexprs three properties, respectively
    • Tok_where = Saves the syntax of the where part in the Desttowhereexpr property of the Qbparseinfo object

The final sample SQL generates two QB objects, the relationship of the QB object is as follows, QB1 is the outer query, QB2 is the subquery

QB1     QB2
2.3 Phase3 logic operator Operator2.3.1 Operator

The MapReduce task that hive eventually generates, the map phase and the reduce phase are composed of operatortree. A logical operator is a single-specific operation that is performed in the map phase or the reduce phase.

The basic operators include Tablescanoperator,selectoperator,filteroperator,joinoperator,groupbyoperator,reducesinkoperator

The function of each operator can be guessed from the name, tablescanoperator the data of the original input table from the map interface of the MapReduce framework, and controls the number of data rows of the scanned table, and marks the data from the original table. Joinoperator complete the join operation. Filteroperator Complete the filtering operation

Reducesinkoperator serializes the field combination of the map side to reduce key/value, Partition Key, which is only possible in the map phase and marks the end of the map phase in the MapReduce program generated by hive.

Operator data transfer between the map reduce phase is a flow-through process. Each operator passes data to the Childoperator calculation after a row of data has been completed.

The main properties and methods of the operator class are as follows

    • Rowschema represents the output field of the operator
    • Inputobjinspector outputobjinspector parsing input and output fields
    • PROCESSOP receives the data passed by the parent operator, forward passes the processed data to the child operator processing
    • After each row of data in hive is processed by a operator, the field is renumbered, Colexprmap records the name correspondence of each expression before and after the current operator, and is used to backtrack the field name in the next phase of the logical optimization phase
    • Since the MapReduce program for Hive is a dynamic program that does not determine what operations a MapReduce job will perform, either join or GROUPBY, operator saves all runtime-required parameters in Operatordesc , Operatordesc is serialized to HDFs before the task is committed and read and deserialized from HDFs before the MapReduce task executes. Map phase Operatortree location on HDFs in job.getconf ("Hive.exec.plan") + "/map.xml"

2.3.2 Queryblock spawn operator Tree

Queryblock Generate operator tree is a property that iterates through the save syntax of the QB and Qbparseinfo objects generated in the previous procedure, including the following steps:

    • QB#ALIASTOSUBQ = has subqueries, recursive calls
    • Qb#aliastotabs = Tablescanoperator
    • qbparseinfo#joinexpr = Qbjointree = Reducesinkoperator + joinoperator
    • qbparseinfo#desttowhereexpr = Filteroperator
    • Qbparseinfo#desttogroupby = Reducesinkoperator + groupbyoperator
    • Qbparseinfo#desttoorderby = Reducesinkoperator + extractoperator

Since the Join/groupby/orderby all need to be done in the reduce phase, the operator will be a reducesinkoperator before generating the corresponding action, combining the fields and serializing them into reduce key/value, Partition Key

Next detailed analysis of the sample SQL generation Operatortree process

The QB object generated by the previous stage in the first order

    1. First based on sub Queryblock qb2#aliastotabs {du=dim.user, c=detail.usersequence_client, p=fact.orderpayment} Generate Tablescanoperator

      Tablescanoperator ("Dim.user") Ts[0] Tablescanoperator ("detail.usersequence_client") ts[1]        TableScanOperator (" Fact.orderpayment ") ts[2]
    2. The first sequence traversal qbparseinfo#joinexpr generated Qbjointree, class Qbjointree is also a tree structure, qbjointree Save the left and right table Astnode and this query alias, the resulting query tree is as follows

      Base    /     p    du  /       c        p
    1. Pre-sequence traversal qbjointree, Mr. Detail.usersequence_client and Fact.orderpayment Join operation Tree

Fig. Ts=tablescanoperator Rs=reducesinkoperator Join=joinoperator

    1. Create a JOIN operation tree for intermediate tables and Dim.user

    1. Generates Filteroperator according to QB2 qbparseinfo#desttowhereexpr. At this point the QB2 traversal is complete.

In some scenarios, Selectoperator determines whether a field needs to be parsed, based on some conditions.

Fig. fil= Filteroperator sel= Selectoperator

    1. Generate Reducesinkoperator + groupbyoperator based on QB1 's qbparseinfo#desttogroupby

Fig. gby= Groupbyoperator
GBY[12] is a hash aggregation, that is, in memory by the hash of the aggregation operation

    1. After the final parsing, a filesinkoperator is generated and the data is written to HDFs

Figure in Fs=filesinkoperator

2.4 Phase4 Logic Layer Optimizer

Most of the logic layer optimizer uses the transform Operatortree, merges the operator, achieves reduces the mapreduce Job, reduces the shuffle data quantity the goal.

Name

Role

②simplefetchoptimizer

Optimization of aggregate queries without groupby expressions

②mapjoinprocessor

Mapjoin, requires that the hint,0.11 version is not available in SQL

②bucketmapjoinoptimizer

Bucketmapjoin

②groupbyoptimizer

Map-Side aggregation

①reducesinkdeduplication

Combine linear operatortree with Partition/sort key in the same reduce

①predicatepushdown

predicate pre-position

①correlationoptimizer

Using correlations in queries to merge job,hive-2206 with correlation

Columnpruner

Field pruning

The ① optimizer in the table is a job to do as many things/merges as possible. ② is to reduce the amount of shuffle data, not even do reduce.

The Correlationoptimizer optimizer is very complex and can take advantage of dependencies in queries, merging related jobs, and referencing Hive Correlation Optimizer

For sample SQL, there are two optimizer optimizations for it. The following describes the roles of these two optimizers and complements the role of an optimizer reducesinkdeduplication

2.4.1 Predicatepushdown Optimizer

Assertion determines that the advance optimizer will advance filteroperator in Operatortree to Tablescanoperator

2.4.2 Nonblockingopdedupproc Optimizer

Nonblockingopdedupproc Optimizer merges Sel-sel or Fil-fil as a operator

2.4.3 reducesinkdeduplication Optimizer

The reducesinkdeduplication can combine two RS with a linear connection. In fact, Correlationoptimizer is a superset of reducesinkdeduplication, capable of combining linear and nonlinear operation RS, but the first implementation of Hive Reducesinkdeduplication

For example, the following SQL statement

From (select key, value from SRC GROUP by key, value) s select S.key Group by S.key;

After the first few stages, the following operatortree are generated, with two of the tree connected and not drawn together.

This time after traversing the operatortree can be found before and after two RS output key value and Partitionkey as follows

Key

PartitionKey

Childrs

Key

Key

Parentrs

Key,value

Key,value

The Reducesinkdeduplication Optimizer detected: 1. PRS key fully contains the CRS key, and the sort order is the same; 2. PRS PartitionKey completely contains CRS PartitionKey. The execution plan is optimized to meet the optimization criteria.

Reducesinkdeduplication Delete the operator between Childrs and Parenthers and Childrs, the key for the reserved RS is the Key,value field, and the Partitionkey is the key field. The merged operatortree are as follows:

2.5 PHASE5 Operatortree The process of generating a mapreduce job

The process of transforming Operatortree into a mapreduce job is divided into the following stages

    1. Generating movetask for output tables
    2. From one of the root nodes of the Operatortree down-depth-first traversal
    3. Reducesinkoperator marks the boundaries of Map/reduce, the boundaries between multiple jobs
    4. Traverse other root nodes, encounter joinoperator merge Mapreducetask
    5. Generate Stattask Update metadata
    6. The relationship between the cut map and the operator between reduce
2.5.1 generating movetask for output tables

From the previous step Operatortree only generated a filesinkoperator, directly generated a movetask, complete the resulting HDFs temporary files moved to the target table directory

Movetask[stage-0]move Operator
2.5.2 starts traversing

Saves all the root nodes in the Operatortree in an array of towalk, loops out the elements in the array (omit QB1, not drawn)

Remove the last element Ts[p] into the stack Opstack{ts[p]}

2.5.3 Rule #1 ts% generates Mapreducetask objects and determines mapwork

The elements in the discovery stack conform to the following rule R1 (here is a simple representation of Python code)

"". Join ([t + "%" for T in opstack]) = = "ts%"

Generates a mapreducetask[stage-1] object, Mapreducetask[stage-1] The Mapwork property of the object to hold a reference to the operator root node. Because of the parent child relationship between Operatortree, this time mapreducetask[stage-1] contains all operator with Ts[p] as the root

2.5.4 Rule #2 ts%.*rs% determine reducework

Continue traversing ts[p] sub-operator, the sub-operator into the stack opstack
When the first RS enters the stack, that is, stack opstack = {Ts[p], fil[18], rs[4]}, the following rules are met R2

"". Join ([t + "%" for T in opstack]) = = "ts%.*rs%"

This time the Reducework property of the Mapreducetask[stage-1] object holds a reference to join[5]

2.5.5 Rule #3 rs%.*rs% to generate a new Mapreducetask object, slicing Mapreducetask

Continue traversing join[5] sub-operator, the sub-operator into the stack opstack

When the second RS is placed on the stack, that is, when stack opstack = {Ts[p], fil[18], rs[4], join[5], rs[6]}, the following rules are met R3

"". Join ([t + "%" for T in opstack]) = = "rs%.*rs%"//iterate through each suffix array of opstack

Create a new Mapreducetask[stage-2] object, cut Operatortree from join[5] and rs[6], and generate a join[5] for operatorfs[19],rs[6] The Mapwork property of the Mapreducetask[stage-2] object holds a reference to ts[20].

The newly generated fs[19] stores the intermediate data in the HDFs temp file.

Continue traversing rs[6] sub-operator, the sub-operator into the stack opstack

When opstack = {Ts[p], fil[18], rs[4], join[5], rs[6], join[8], sel[10], gby[12], rs[13]}, the R3 rule is met

Similarly, generate mapreducetask[stage-3] objects, and cut the operatortree of Stage-2 and Stage-3

2.5.6 R4 fs% connection Mapreducetask and Movetask

After all sub-operator are finally deposited in the stack, Opstack = {Ts[p], fil[18], rs[4], join[5], rs[6], join[8], sel[10], gby[12 [rs[13], gby[14], sel[15 ], fs[17]} satisfies the rule R4

"". Join ([t + "%" for T in opstack]) = = "fs%"

This is when Movetask is connected to mapreducetask[stage-3] and generates a statstask that modifies the table's meta information

2.5.7 Merging Stage

There is no end at this time, and there are two root nodes that are not traversed.

Empties the Opstack stack, adding the second element of the Towalk to the stack. Opstack = {Ts[du]} will be found to continue to meet R1 ts%, generate mapreducetask[stage-5]

Continue to traverse from Ts[du], when Opstack={ts[du], rs[7]}, satisfies the rule R2 ts%.*rs%

When you save Join[8] as Reducework in mapreducetask[stage-5], the mapreducework of operator and Map<operator objects saved in a Map object is found, mapreducework= "" > Found in Object, Join[8] already exists. Merge mapreducetask[stage-2] and mapreducetask[stage-5] into one mapreducetask

Similarly, from the last root node ts[c], the Mapreducetask is also merged

2.5.8 Segmentation Map Reduce phase

At the last stage, the Operatortree in Mapwork and reducework are cut off with RS as the boundary.

2.5.9 Operatortree generating Mapreducetask panorama

Finally, a total of 3 mapreducetask are generated, such as

2.6 Phase6 Physical Layer Optimizer

The principle of each optimizer is not described in detail here, and a separate introduction to the Mapjoin Optimizer

Name

Role

Vectorizer

HIVE-4160, will be released in 0.13

Sortmergejoinresolver

Fits with buckets, similar to merge sort

Samplingoptimizer

Parallel order by optimizer, published in 0.12

Commonjoinresolver + mapjoinresolver

Mapjoin Optimizer

2.6.1 mapjoin principle

Mapjoin is simply to read the small table into memory in the map phase, and scan the large table to complete the join in sequence.

is a schematic diagram of Hive Mapjoin, from a Facebook engineer Liyin Tang An introduction to join optimization slice, you can see that the mapjoin is divided into two stages:

    1. With the MapReduce Local Task, the small table is read into memory, and the generated hashtablefiles is uploaded to the distributed cache, where the hashtablefiles is compressed.

    2. MapReduce job in the map phase, each mapper reads hashtablefiles into memory from the distributed cache, scans the large table sequentially, joins directly in the map phase, and passes the data to the next MapReduce task.

If the two tables of a join are a temporary table, a conditionaltask is generated to determine whether to use mapjoin during run time.

2.6.2 Commonjoinresolver Optimizer

The Commonjoinresolver optimizer translates commonjoin into Mapjoin, and the conversion process is as follows

    1. Depth-first traversal of task Tree
    2. Find Joinoperator, determine the size of the left and right table data
    3. Pairs with small table + big table = = Mapjointask, for small/large table + Intermediate table = = Conditionaltask

Traversing the MapReduce task generated in the previous phase, found that one of the tables in Mapreducetask[stage-2] join[8] is a temporary table, with a deep copy of the Stage-2 (due to the need to retain the original execution plan for backup plan, So here is a copy of the execution plan, generate a mapjoinoperator instead of joinoperator, and generate a Mapreducelocalwork read small table generation Hashtablefiles upload to Distributedcache.

Mapreducetask the transformed execution plan as shown in

2.6.3 Mapjoinresolver Optimizer

The Mapjoinresolver optimizer traverses the task Tree and splits all mapreducetask with local work into two task

When the final mapjoinresolver is processed, the execution plan is as shown

3, the design of the Hive SQL compilation process

From the above process of SQL compilation, we can see that the design of the compilation process has several advantages worthy of learning and reference

    • Using the ANTLR open source software to define grammar rules greatly simplifies the parsing process of lexical and grammatical compilation, just need to maintain a copy of the grammar file.
    • The overall idea is clear, the phased design makes the entire compilation process code easy to maintain, making subsequent optimizations easy to plug-and-pull switches, such as the latest features of hive 0.13 vectorization and the support of the Tez engine are pluggable.
    • Each operator only completes a single function, simplifying the entire MapReduce program.
4. Direction of community development

Hive is still in rapid development, in order to improve the performance of Hive, Hortonworks company-led Stinger program proposed a series of improvements to hive, the more important improvements are:

    • Vectorization-Enables hive to change from single-line processing data to batch processing, greatly improving instruction pipelining and cache utilization
    • Hive on Tez-replaces the MapReduce computation framework at the bottom of hive with the Tez compute framework. Tez can not only support the task MRR in the multi-reduce phase, but also can submit the execution plan at one time, so it can better allocate resources.
    • Cost Based Optimizer-Enables hive to automatically select the optimal join sequence to improve query speed
    • Implement Insert, update, and delete in Hive with full ACID support-supports table incremental updates by primary key

We will also follow up the development of the community, combined with their own business needs, to enhance the performance of the hive-type ETL Process

5. Reference

antlr:http://www.antlr.org/
Wiki ANTLR Introduction: Http://en.wikipedia.org/wiki/ANTLR
Hive Wiki:https://cwiki.apache.org/confluence/display/hive/home
Hivesql Compilation process: Http://www.slideshare.net/recruitcojp/internal-hive
Join optimization in Hive:join Strategies in Hive from the Hadoop Summit (Liyin Tang, Namit Jain)
Hive Design Docs:https://cwiki.apache.org/confluence/display/hive/designdocs

Original address: http://tech.meituan.com/hive-sql-to-mapreduce.html

Hive SQL Compilation process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.