SQL Optimizer Fundamentals-query optimizer overview

Source: Internet
Author: User

Absrtact: This paper is mainly a review of database query optimizer, including query optimizer classification, query optimizer execution process and CBO framework calcite.

This is one of a series of articles on the principles of the SQL Optimizer Maxcompute. We will be rolling out additional articles from the SQL optimizer on optimization rules and frameworks. Add nail Group "Relational Algebra Optimization Technology" (group number 11719083) to get the latest article release dynamics.

This article is a summary of the database query optimizer, including:

Query optimizer definition, classification
query optimizer execution Process
Introduction to the CBO framework calcite
1. What is the query optimizer?
The database consists of three main components, the parser, the optimizer, and the execution engine, as shown in the following:

The optimizer is the core component of the database used to transform relational expressions into execution plans, which largely determines the performance of a system.

2. Query Optimizer classification
The query optimizer is divided into two categories: the rule-based optimizer (rule-based Optimizer,rbo) and the cost-based optimizer (cost-based OPTIMIZER,CBO):

Rule-based optimizer (rule-based Optimizer,rbo)
The conversion of a relational expression according to the optimization rules, where the transformation is that a relational expression is optimized to become another relational expression, and the original expression is clipped off, and a series of transformations result in a final execution plan.

The Rbo contains a set of strict order optimization rules, and the same SQL, regardless of the data in the Read table, the resulting execution plan is the same. At the same time, the differences in SQL notation in Rbo are likely to affect the final execution plan, which can affect script performance.

Cost-based optimizer (cost-based OPTIMIZER,CBO)
The conversion of a relational expression according to the optimization rules, where the transformation is that a relational expression is optimized to generate another relational expression, and the original expression is preserved, after a series of transformations, a number of execution plans are generated, and the CBO is based on the statistical information and cost model Calculates the cost of each execution plan from which to select the least cost execution plan. From the above, there are two dependencies in the CBO: statistical information and cost models. Whether the statistic information is accurate or not, the reasonable cost model will affect the CBO to choose the best plan.

From the above description, the CBO is better than Rbo, because Rbo is a rule-based, insensitive to the data of the inflexible optimizer, and in the actual process, the data is often changed, the execution plan generated by Rbo is probably not optimal.

In fact, major databases and big data computing engines tend to use the CBO, for example, starting with Oracle 10g, Oracle has completely abandoned RBO to use the CBO, and Hive introduced the CBO in version 0.14.

3. Query optimizer execution process
Both the Rbo and the CBO include a series of optimization rules that allow equivalent conversions of relational expressions, and common optimization rules include:

predicate push-down
Column cropping
Constant folding
Other
On the basis of these optimization rules, the equivalent conversion of the relational expression can be made to generate the execution plan. The following is a description of the RBO and CBO two optimizer execution process.

RBO
The RBO process is simple and consists of two steps:
1) Transformation
Iterate over the relationship expression, as long as the pattern satisfies the specific optimization rule to be converted.
2) Build Physical Plan
A logical execution plan is generated after STEP1, but this is only logical, and the logical execution plan needs to be built into a physical execution plan that determines the specific implementation of each operator. The specific implementation of the Join operator chooses Broadcasthashjoin or Sortmergejoin.

Cbo
The CBO query optimization consists of three main steps:
1) Exploration
The equivalent transformation is made according to the optimization rules, and the expression of equivalence relation is generated, and the original relational expression is preserved.
2) Build Physical Plan
Determine the specific implementation of each operator.
3) Find Best Plan
Calculates the cost of each execution plan based on statistics and selects the execution plan with the lowest cost.

The CBO implementation has two models, the volcano model [1] and the Cascades model [2], where calcite uses the volcano model, and Orca[3] uses the Cascades model. The idea of these two models is basically the same, the difference is that the Cascades model is not the first explore, after the build, but the edge of the explore side build, thus further cutting off some of the execution plan. This is not the start, interested students can read the relevant papers.

4.CBO Frame Calcite Introduction
Apache calcite is a storage and execution-independent SQL optimization engine that is widely used in open-source big data computing engines such as Flink, Drill, Hive, Kylin, and more. In addition, Maxcompute also uses calcite as the optimizer framework. The architecture of the calcite is as follows:

Where operator Expressions refers to a relational expression, a relational expression is represented as relnode in calcite, and often the root node represents the entire query tree. There are two methods of generating relnode in calcite:

Direct parse generation via parser
As you can see from the architecture diagram above, calcite also provides parser for SQL parsing, and the Relnode Tree can be obtained directly using parser.

Generated by Expressions Builder conversion
Different system syntax differs, so parser may also be different. In this case, calcite provides expressions builder to convert the abstract syntax tree (or other data structure) to the Relnode tree. This is the method used by hive (a data processing System).

Query Optimizer performs a series of equivalent conversions to operator expressions based on the optimization rules (pluggable), generates different execution plans, and finally chooses the least expensive execution plan, where the cost is calculated using metadata Statistical information provided by providers.

In fact, calcite provides two optimization methods for Rbo and CBO, which correspond to Hepplanner and Volcanoplanner respectively. To this, this article also does not carry on the development, later has the time to introduce the concrete realization of calcite in detail.

5. Summary
This paper is a review of the query optimizer, introduces the classification of the query optimizer, the execution process, and the general framework of the optimizer calcite.
Read the original? Please add a link description

This article is the original content of the cloud-Habitat community and cannot be reproduced without permission.

SQL Optimizer Fundamentals-query optimizer overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.