Impala source code analysis (3)-backend query execution process

Source: Internet
Author: User
This article mainly introduces how impala-backend executes a SQLQuery. In Impala, The SQLQuery entry function is voidImpalaServer: query (QueryHandlequery_handle, constQueryquery) to generate a QueryExecState with the lifecycle of the SQL statement execution, which indicates the SQL statement being executed. Call E

This article describes how impala-backend executes an SQL Query. In Impala, the SQL Query entry function is void ImpalaServer: query (QueryHandle query_handle, const Query query) to generate a QueryExecState with the lifecycle of SQL Execution, indicates the SQL statement being executed; call E

This article describes how impala-backend executes an SQL Query.
In Impala, the SQL Query entry function is:
Void ImpalaServer: query (QueryHandle & query_handle, const Query & query)

  • Generate a QueryExecState along with the lifecycle of the SQL statement, representing the SQL statement being executed;
  • Call the Execute function to start the execution process;
  • Start a Wait thread and Wait for the result.

The Execute () function first requests SQL parsing and execution plan generation from impala-fe through JNI (the process has been described in the previous article) to obtain the TExecRequest object corresponding to the Query, it is executed by impala-backend.
Start backend execution from the following function and start the fragment status report.
Status ImpalaServer: QueryExecState: Exec (TExecRequest * exec_request)
Because we know that in impala, a Query is allocated to multiple nodes for execution. We call the Coordinator component responsible for allocating and coordinating the Query execution; each node involved in the Query execution is called a backend instance. One or more PlanFragment instances are executed on each backend instance. Then each Query corresponds to one Coordinator object and multiple backend instances, And the query_profile _ variable in Coordinator is used to count the entire profile of the query execution.

Coordinator

The Coordinator is generated to coordinate the execution of the Query, and then
Status? Coordinator: Exec (
Const TUniqueId & query_id, TQueryExecRequest * request,
Const TQueryOptions & query_options)
Start the asynchronous execution process: to put it bluntly, this Coordinator is the boss. After arranging all the active (PlanFragment) jobs for each subordinate (backend instance), send them out, and then leave work, they will not leave until their subordinates finish their work. Because the boss has already arranged his secretary (ImpalaServer: Wait () to stare at the results.
The two most important steps in this function are as follows:

  • ComputeScanRangeAssignment (* request );
  • ComputeFragmentExecParams (* request );

Where ComputeScanRangeAssignment (const TQueryExecRequest & exec_request )? Used to fill std: vector Scan_range_assignment _ The array is indexed by PlanFragment.
Typedef boost: unordered_map FragmentScanRangeAssignment indicates the backend instance of a PlanFragment and the pering of its corresponding PerNodeScanRanges. PerNodeScanRanges indicates the planing of all plannodes involved in a PlanFragment to ScanRange.

Another function: ComputeFragmentExecParams? (Const TQueryExecRequest & exec_request )? Used to fill std: vector Fragment_exec_params _?. In this parameter, each FragmentExecParams corresponds to a parameter used in PlanFragment execution.

  • Status Coordinator: ComputeFragmentHosts (const TQueryExecRequest & exec_request): locate the backend instance for each PlanFragment. If a PlanFragment is UNPARTITIONED, it runs on the host where the Coordinator is located. If a PlanFragment contains ScanNode, It schedules the PlanFragment to those DataNodes where the HDFS/HBase data blocks are located, that is, these DataNodes become the backend instance for executing the Query.
  • Calculate the hosts on which each PlanFragment in TQueryExecRequest. fragment will be executed and fill it in fragment_exec_params.
  • Assign an instance_id to each host executed by PlanFragment in sequence.
  • Fill in each? FragmentExecParams? Destinations (the destination PlanFragment of the Data Sink) and per_exch_num_senders (How many PlanFragment Data will this ExchangeNode receive)

Return to the Coordinator: Exec () function. Next we should allocate all PlanFragment jobs.

  • If there is a Coordinator PlanFragment, new PlanFragmentExecutor () is used to generate the PlanFragment corresponding to this PlanFragment. Then fill in the corresponding TExecPlanFragmentParams.
  • Below is a double loop: the outer layer traverses PlanFragment, and the inner layer traverses the backend instance to generate the BackendExecState associated with each instance (mainly to generate the parameters used for Coordinator to interact with multiple backend instances ), add backend_exec_states _ to the list for Coordinator to manage the execution status of all backend instances. Then initiate an RPC request to each instance to start execution. The request protocol is ImpalaInternalService: ExecPlanFragment (TExecPlanFragmentParams)

Status fragments_exec_status = ParallelExecutor: Exec (
Bind (Mem_fn (& Coordinator: ExecRemoteFragment), this, _ 1 ),
Reinterpret_cast (& Backend_exec_states _ [backend_num-num_hosts]),
Num_hosts );

Each Coordinator, PlanFragmentExecutor, and ExecNode has a RuntimeProfile. All runtimeprofiles form a tree structure to record information about each execution node.
There is a member variable boost: scoped_ptr in Coordinator. Query_profile _ indicates all profile information in the query process.
Each Coordinator also has an aggregate_profile _ dedicated for the profile related to aggregate.

PlanFragmentExecutor and ExecNode

PlanFragment executed on both the Coordinator side and the backend instance side is controlled by a PlanFragmentExecutor. Next let's take a look at how PlanFragment is executed in the backend instance?
ImpalaServer: ExecPlanFragment ()-> ImpalaServer: StartPlanFragmentExecution ()
The generated FragmentExecState contains a PlanFragmentExecutor. The following describes how PlanFragmentExecutor controls Query execution.

  • FragmentExecState: Prepare () call PlanFragmentExecutor: Prepare ()
  • FragmentExecState: Exec () calls PlanFragmentExecutor: Open (), which is the main loop of PlanFragment execution and blocks until the execution of this PlanFragment ends.

PlanFragment really controls the execution of PlanFragmentExecutor, mainly composed of the functions Prepare ()/Open ()/GetNext ()/Close.

1 ,? PlanFragmentExecutor: Prepare (TExecPlanFragmentParams): the main process is as follows:

  • Set the memory mem_limit that can be used for this query;
  • DescriptorTbl: Create (): Initialize descriptor table;
  • ExecNode: CreateTree (): generate the structure of the execution tree (parent-child relationship ). The execution tree consists of ExecNode. Each ExecNode also provides the Prepare (), Open (), and GetNext () functions. ExecNode: Prepare/Open/GenNext/EvalConjuncts/Close is performed recursively according to this tree structure. After initialization, PlanFragmentExecutor: plan _ points to the root node of the execution tree. In this tree, the root node is last executed, and the leaf node is first executed;
  • Set the number of sender data that the Exchange Node of the PlanFragment will receive;
  • Call plan _-> Prepare (): recursively initialize the execution tree from the root node, it mainly initializes statistics such as runtime_profile and conjuncts LLVM local code generation (adding functions to the LlvmCodeGen object );
  • If you use local code generation, call runtime_state _-> llvm_codegen ()-> OptimizedModule () for optimization;
  • Map the Scan Range corresponding to all scannodes to file/offset/length;
  • DataSink: CreateDataSink ();
  • Set up profile counter;
  • Generate RowBatch to store results.

2, PlanFragmentExecutor: Open ()

Start the profile-reporting thread, and then call OpenInternal ()

(1 )???? Call plan _-> Open () to call ExecNode: Open () in sequence along the generated ExecNode execution tree ()
The following uses HdfsScanNode: Open () as an example:

  • Call DiskIoMgr: RegisterReader to initialize hdfs_connection _;
  • Add the File and Split to the queue queued_ranges _ of HdfsScanNode;
  • Call HdfsScanNode: DiskThread driver HdfsScanNode: StartNewScannerThread ()-> HdfsScanNode: ScannerThread-> hdfskernel: ProcessSplit () read data (currently, only one scan range can be read by One Worker thread );
  • Call IssueQueuedRanges () to send the pre-read Range added to queued_ranges _ to DiskIoMgr. Since disk thread has been started in the previous step, you can read the data.

(2 )???? If the current PlanFragmen has a sink, send the PlanFragment data to other PF instances. Call PlanFragmentExecutor: GetNextInternal () to recursively call ExecNode: GetNext () of the Execution tree to obtain the execution result before sending the message.
The logic for ExecNode: Open () is different for different types of ExecNode, and the same for GetNext (). For details, refer to HdfsScanNode: GetNext () or HashJoinNode :: getNext () to see how to obtain the query result.

3 ,? PlanFragmentExecutor: GextNext (RowBatch ** batch)

The ExecNode: GetNext () function that triggers the execution tree is displayed to obtain the query result. When it marks PlanFragmentExecutor: done _ = true, it indicates that all data has been processed and the PlanFragmentExecutor can exit.

At this point, impala-backend has been analyzed. In general, the difference between impala and MapReduce and Hive in execution can be summarized as one push.

  • In MapReduce, the output result of Map is to be pulled by Reduce. After the execution of each PlanFragment in impala is completed, DataSink is pushed to other PlanFragment. In this way, the bandwidth can be used more effectively to speed up Job execution.
  • In Hive, logical upstream and downstream nodes are pushed to downstream nodes, while in impala, downstream nodes are pulled from upstream nodes by calling GetNext () recursively.

Original article address: Impala source code analysis (3)-backend query execution process, thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.