Hive SQL Execution Plan

Source: Internet
Author: User

Hive provides an explain command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED] query

Hive> explain select a. Bar, count (*) from invites a where a. Foo> 0 group by A. bar;

OK
Abstract syntax tree:
(Tok_query (tok_from (tok_tabref (tok_tabname invites) A) (tok_insert (tok_destination (tok_dir tok_tmp_file) (tok_select (tok_selexpr (. (tok_table_or_col A) bar) (tok_selexpr (tok_functionstar count) (tok_where (> (. (tok_table_or_col A) Foo) 0) (tok_groupby
(. (Tok_table_or_col A) bar ))))

Stage dependencies:
Stage-1 is a root stage
Stage-0 is a root stage

Stage plans:
Stage: stage-1
Map reduce
Alias-> map operator tree:
A
Tablescan
Alias:
Filter operator
Predicate:
Expr: (FOO> 0)
Type: Boolean
Filter operator
Predicate:
Expr: (FOO> 0)
Type: Boolean
Select Operator
Expressions:
Expr: bar
Type: String
Outputcolumnnames: bar
Group by operator
Aggregations:
Expr: Count ()
Bucketgroup: false
Keys:
Expr: bar
Type: String
Mode: Hash
Outputcolumnnames: _ col0, _ col1
Reduce output Operator
Key Expressions:
Expr: _ col0
Type: String
Sort order: +
Map-Reduce partition columns:
Expr: _ col0
Type: String
Tag:-1
Value expressions:
Expr: _ col1
Type: bigint
Reduce operator tree:
Group by operator
Aggregations:
Expr: Count (value. _ col0)
Bucketgroup: false
Keys:
Expr: Key. _ col0
Type: String
Mode: mergepartial
Outputcolumnnames: _ col0, _ col1
Select Operator
Expressions:
Expr: _ col0
Type: String
Expr: _ col1
Type: bigint
Outputcolumnnames: _ col0, _ col1
File output Operator
Compressed: false
Globaltableid: 0
Table:
Input Format: org. Apache. hadoop. mapred. textinputformat
Output Format: org. Apache. hadoop. hive. QL. Io. hiveignorekeytextoutputformat

Stage: stage-0
Fetch Operator
Limit:-1

Time taken: 0.133 seconds

Hive> explain insert overwrite table LPX select t1.bar, t1.foo, t2.foo from pokes T1 join invites T2 on (t1.bar = t2.bar );
OK
Abstract syntax tree:
(Tok_query (tok_from (tok_join (tok_tabref (tok_tabname pokes) T1) (tok_tabref (tok_tabname invites) T2) (= (. (tok_table_or_col T1) bar )(. (tok_table_or_col T2) bar) (tok_insert (tok_destination (tok_tab (tok_tabname LPX) (tok_select (tok_selexpr (.
(Tok_table_or_col T1) bar) (tok_selexpr (. (tok_table_or_col T1) Foo) (tok_selexpr (. (tok_table_or_col T2) Foo )))))

Stage dependencies:
Stage-1 is a root stage
Stage-0 depends on stages: stage-1
Stage-2 depends on stages: stage-0

Stage plans:
Stage: stage-1
Map reduce
Alias-> map operator tree:
T1
Tablescan
Alias: T1
Reduce output Operator
Key Expressions:
Expr: bar
Type: String
Sort order: +
Map-Reduce partition columns:
Expr: bar
Type: String
Tag: 0
Value expressions:
Expr: foo
Type: int
Expr: bar
Type: String
T2
Tablescan
Alias: t2
Reduce output Operator
Key Expressions:
Expr: bar
Type: String
Sort order: +
Map-Reduce partition columns:
Expr: bar
Type: String
Tag: 1
Value expressions:
Expr: foo
Type: int
Reduce operator tree:
Join Operator
Condition map:
Inner join 0 to 1
Condition expressions:
0 {value. _ col0} {value. _ col1}
1 {value. _ col0}
Handleskewjoin: false
Outputcolumnnames: _ col0, _ col1, _ col5
Select Operator
Expressions:
Expr: _ col1
Type: String
Expr: _ col0
Type: int
Expr: _ col5
Type: int
Outputcolumnnames: _ col0, _ col1, _ col2
File output Operator
Compressed: false
Globaltableid: 1
Table:
Input Format: org. Apache. hadoop. mapred. textinputformat
Output Format: org. Apache. hadoop. hive. QL. Io. hiveignorekeytextoutputformat
Serde: org. Apache. hadoop. hive. serde2.lazy. lazysimpleserde
Name: Default. LPX

Stage: stage-0
Move Operator
Tables:
Replace: True
Table:
Input Format: org. Apache. hadoop. mapred. textinputformat
Output Format: org. Apache. hadoop. hive. QL. Io. hiveignorekeytextoutputformat
Serde: org. Apache. hadoop. hive. serde2.lazy. lazysimpleserde
Name: Default. LPX

Stage: stage-2
Stats-aggr Operator

Note:
Abstract syntax tree is an abstract syntax tree.

Slave information header:
Stage dependencies:
Stage-1 is a root stage
Stage-0 depends on stages: stage-1
Stage-2 depends on stages: stage-0
It can be seen from this that the job structure of the plan is divided into three jobs for execution,
The first job is composed of stage-1;
The second job processing is composed of stage-0. The processing of stage-0 must depend on the results of stage-1;
The third job processing is composed of stage-2. The processing of stage-2 must depend on the results of stage-0.

The following describes stage-1 and stage-0 respectively. The SQL Execution can be divided into two steps:
(1) Select t1.bar, t1.foo, t2.foo from pokes T1 join invites T2 on (t1.bar = t2.bar );
(2) Insert overwrite table LPX;
Stage: stage-1 corresponds to a complete map reduce task, including map operator tree and reduce operator tree. Map operator tree corresponds to map task and reduce operator tree corresponds to reduce task.
From the map operator tree stage, we can see two parallel operations T1 and T2, select t1.bar, t1.foo from T1; and select t2.foo from T2; the two map tasks generate the input [reduce output operator] In the reduce stage respectively.
From the analysis of reduce operator tree, we can see the following information, connect map outputs with conditions, and generate data in the storage format that complies with default. LPX through predefined output format to be stored in HDFS. Create an LPX table
The storage format of the table is not specified. The storage format is text by default, and the input and output are read and written using textinputformat and textoutputformat:
Table:
Input Format: org. Apache. hadoop. mapred. textinputformat
Output Format: org. Apache. hadoop. hive. QL. Io. hiveignorekeytextoutputformat
Serde: org. Apache. hadoop. hive. serde2.lazy. lazysimpleserde
Name: Default. LPX
The input format value corresponds to org. Apache. hadoop. mapred. textinputformat,
This is because the temporary output files generated in the initial map stage are saved in the textoutputformat format. The data read by natural reduce is processed and read by the textinputformat format. These are
And hive only needs to specify the processing format.
The serde value is Org. apache. hadoop. hive. serde2.lazy. lazysimpleserde class. The saved value of this object is _ col0, _ col1, _ col2, which is the t1.bar, t1.foo, t2.foo, the specific value should be the column delimiter set by _ col0 + Table LPX + _ col1 + Table LPX + _ col2. Outputformat: org. Apache. hadoop. hive. QL. Io. hiveignorekeytextoutputformat
This class is used for processing.
Stage-0 corresponds to the second operation mentioned above. In this case, the temporary processing file generated by stage-1, such as TMP, needs to be processed to the LPX table through stage-0 operations. Move operator indicates that this is not
For mapreduce tasks, you only need to call movetask for processing. Before processing, the system checks whether the input file conforms to the storage format of the LPX table.

Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.