Impala source code analysis (2)-SQL parsing and execution plan generation

Source: Internet
Author: User
The SQL parsing and execution plan generation of Impala is implemented by impala-frontend (Java), and the listening port is 21000. The user submits a request through the Beeswax interface BeeswaxService. query (). The processing logic at the impalad end is determined by voidImpalaServer: query (QueryHandlequery_handle, constQueryquery ).

The SQL parsing and execution plan generation of Impala is implemented by impala-frontend (Java), and the listening port is 21000. The user submits a request through the Beeswax interface BeeswaxService. query (). The processing logic at the impalad end is determined by void ImpalaServer: query (QueryHandle query_handle, const Query query ).

The SQL parsing and execution plan generation of Impala is implemented by impala-frontend (Java), and the listening port is 21000. The user uses the Beeswax interface BeeswaxService. query () submits a request, and the processing logic on the impalad end is completed by the void ImpalaServer: query (QueryHandle & query_handle, const Query & query) function (implemented in the impala-beeswax-server.cc.

One SQL statement in impala goes through BeeswaxService. Query-> TClientRequest-> TExecRequest, and finally submits the TExecRequest to impala-coordinator for multiple backends. This article describes how to change an SQL statement to TExecRequest step by step.

The following uses an SQL statement as an example:

select jobinfo.dt,user,max(taskinfo.finish_time-taskinfo.start_time),max(jobinfo.finish_time-jobinfo.submit_time)from taskinfo join jobinfo on jobinfo.jobid=taskinfo.jobidwhere jobinfo.job_status='SUCCESS' and taskinfo.task_status='SUCCESS'group by jobinfo.dt,user

Call the Status ImpalaServer: GetExecRequest (const TClientRequest & request, TExecRequest * result) function to convert TClientRequest to TExecRequest.

In this function, call frontend. createExecRequest () through the JNI interface to generate a TExecRequest. First, call AnalysisContext. analyze (String stmt) to analyze the submitted SQL statement.

Note: The Analyzer object is a knowledge base that stores all information involved in this SQL statement (including Table, conjunct, slot, slotRefMap, and eqJoinConjuncts, everything related to this SQL statement is stored in the Analyzer object.

1. SQL lexical analysis and syntax analysis

AnalysisContext. analyze (String stmt) will call the SelectStmt. analyze () function. This function is used to analyze the SQL and perform various information to Analyzer register in the central knowledge base.

(1) tables involved in processing this SQL statement (that is, TableRefs) are extracted from the from clause (including keywords from, join, on/using ). Note that the JOIN operation and the on/using condition are stored and analyzed in TableRef on the right of the table involved in the JOIN operation. Analyze () each TableRef in turn, and register registerBaseTableRef (fill in TupleDescriptor) with Analyzer ). If the corresponding TableRef involves the JOIN operation, analyzeJoin () is also required (). In analyzeJoin (), Analyzer registerConjunct () is filled with some member variables of Analyzer: conjuncts, tuplePredicates (ing between TupleId and conjunct), slotPredicates (ing between SlotId and conjunct), or. In this example, the on clause is a kind of BinaryPredicate, and onClause. analyze (analyzer) recursively analyzes various components in the on clause.

(2) process the select clause (including Aggregate functions such as the keyword select, MAX (), and AVG (): analyze which items are selected for this SQL statement, each item is an Expr subclass object. Fill these items in the resultExprs array and colLabels. Then, recursively analyze the Expr in resultExprs, analyze it to the bottom layer of the tree, and register SlotRef with Analyzer.

(3) Analyze the where clause (keyword where). First, recursion the tree composed of Expr in Analyze clause, and then fill Analyzer registerConjunct () with some member variables of Analyzer (same as 1, also fill in whereClauseConjuncts ).

(4) process sort-related information (Keyword: order ). First, parse aliases and ordinals, then extract Expr from the clause after order by and fill in orderingExprs, then recursively Analyze the tree composed of Expr in the clause, and finally create a SortInfo object.

(5) process aggregation-related information (such as group by, having, avg, and max keywords ). First, recursively analyze the Expr in the group by clause, and if having clause is like the where clause, it is first a tree composed of Expr in analyze having clause, and then to Analyzer registerConjunct.

(6) process InlineView.

The data structures involved in SQL parsing are as follows:

At this point, the lexical analysis ends, a bit like a small compiler. Now we are going back to the frontend. createExecRequest () function. After AnalysisContext. analyze () is called, the member variables in TExecRequest are filled.

(1) if it is a DDL command (use, show tables, show databases, describe), call createDdlExecRequest ();

(2) Another case is the Query or DML command, so you have to create and fill in TQueryExecRequest.

2. Generate execution plans based on the SQL syntax tree (generation of PlanNode and PlanFragment)

The following describes how to use Planner to convert the syntax tree parsed by SQL into Plan fragments. The latter can be executed in each backend.

Planner planner = new Planner ();

ArrayListfragments =

Planner. createPlanFragments (analysisResult, request. queryOptions );

This createPlanFragments () function is the most important function of frontend: generate an execution plan based on the SQL parsing result and the query options passed in by the client. The execution plan is represented by an array of PlanFragment. It is serialized to TQueryExecRequest. fragments and then sent to the backend coordinator for scheduling.

Next, go to the Planner. createPlanFragments () function to see how the execution plan is generated:

First, we need to clarify two concepts: PlanNode and PlanFragment.

PlanNode is the logical function node parsed by SQL, and PlanFragment is the real execution plan node.

2.1 create a PlanNode

PlanNode singleNodePlan =

CreateQueryPlan (queryStmt, analyzer, queryOptions. getDefault_order_by_limit ());

(1) This function first creates a PlanNode based on the first TableRef in the from clause, which is generally ScanNode (HdfsScanNode or HBaseScanNode ). The ScanNode associates an array of ValueRange (consisting of Multiple cluster column value ranges) to indicate the range of the Table to be read and a conjunct (where clause ).

(2) create a HashJoinNode for the remaining tables in TableRef in this SQL statement. Enter Planner. createHashJoinNode () function: first create a ScanNode for the Table (same as above), and then call getHashLookupJoinConjuncts () to obtain the eqJoinConjuncts and eqJoinPredicates of two or multiple Table JOIN operations. Create a HashJoinNode using. Each HashJoinNode is also tree-like and has child nodes. For the two table JOIN examples, the child node is the ScanNode corresponding to the two tables. (Note: currently, impala only supports JOIN operations for one or two tables (left, right, and small by default, the JOIN filter is implemented by distributing small tables on the right to the memory of each node in a range of the large table on the left .)

(3) If a group by clause exists, create an AggregationNode and set the HashJoinNode as its child. The DISTINCT aggregation function is not considered here.

(4) If order... Limit clause to create a SortNode.

In this way, after the createQueryPlan () function is executed, the execution tree composed of plannodes is formed as follows:

2.2 create PlanFragment

Next we will look at the number of impala backend nodes. If there is only one node, the entire execution tree will be executed on the same impalad; otherwise, createPlanFragments (singleNodePlan, isPartitioned, false, fragments) will be called) converts an execution tree composed of plannodes into an execution plan composed of PlanFragment.

Enter the createPlanFragments () function below:

This is a recursive function. It recursively goes down the execution tree composed of plannodes and creates corresponding Fragment respectively.

(1) if it is a ScanNode, create a PlanFragment (the root node of this PlanFragment is the ScanNode, and this PlanFragment only contains one PlanNode ).

(2) If HashJoinNode is used, instead of creating a new PlanFragment, leftChildFragment (A ScanNode) is modified to use HashJoinNode as the root node's PlanFragment. Because HashJoinNode generally has two ScanNode children. Before processing HashJoinNode, the two scannodes have been converted into the corresponding PlanFragment. In this case, the PlanFragment of HashJoinNode as the root node is obtained through Planner. createHashJoinFragment () function: first, use the current HashJoinNode as the root node of HashJoinFragment; then use the root PlanNode in leftChildFragment (that is, the ScanNode corresponding to the table on the left of the two tables involved in JOIN) as the left child of HashJoinNode; by calling Planner. the connectChildFragment () function sets the right child of HashJoinNode to an ExchangeNode (this ExchangeNode represents a 1: n data stream handler), and uses rightChildFragment (ScanNode as the root node) set destination to ExchangeNode.

(3) For AggregationNode, clustering operations are complicated. In our example, if this AggregationNode is not 2nd phase of DISTINCT aggregation (because the child of AggregationNode in this example is HashJoinNode rather than another AggregationNode ), first, set the generated HashJoinNode as the root node of the PlanFragment corresponding to the root node to the AggregationNode, and set the original root node (that is, HashJoinNode) as the child of the new root node. Then, use Planner. createParentFragment () to create a new PlanFragment containing ExchangeNode as the root node. Set the destination of the Child PlanFragment to ExchangeNode. Create a new AggregationNode in the new PlanFragment as the new root node and use the ExchangeNode as the child node.

So far, the createPlanFragments () call is complete. The three generated PlanFragment are as follows:

You can use createPlanFragments (singleNodePlan, isPartitioned, false, fragments) to obtain the fragment array composed of execution plan PlanFragment. The last element of this array is the root node PlanFragment. Then, call PlanFragment. finalize () to specify the DataStreamSink for each PlanFragment at the same time for the execution plan finalize (recursive finalize for each PlanNode.

Return to the frontend. createExecRequest () function. The ArrayList returned by Planner. createPlanFragments () is the complete execution plan. Then we call PlanFragment. toThrift () to serialize it to TQueryExecRequest. Fill in the variables related to TQueryExecRequest: dest_fragment_idx, per_node_scan_ranges, query_globals, result_set_metadata, etc. Finally, the TExecRequest object is returned for backend execution.

Impala-backend (C ++ code) obtains this TExecRequest object, and a coordinator distributes and executes each backend. This is the content of the next article.

Vomit: you can still see the shadow of MapReduce... Each PlanFragment has a DataStreamSink that points to the ExchangeNode in other PlanFragment, which is a 1-to-N relationship... Therefore, the bottleneck of the distributed system is Data Shuffle, whether it is the MapReduce model or impala. This also shows that the optimization of Hive by Tez/Stinger Initiative is still worth looking forward.

References: http://blog.csdn.net/wind5shy/article/details/8563355

Original article address: Impala source code analysis (2)-SQL parsing and execution plan generation, thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.