What exactly does the SQL optimizer do for you?

Source: Internet
Author: User

One of the big advantages of a relational database is that the user doesn't have to care about how the data is accessed, because the optimizer is doing it for us, but when it comes to SQL query optimization, I have to focus on it because it involves query performance issues.

Experienced programmers are familiar with some SQL optimizations, such as the leftmost matching principle we often call, non-BT predicate evasion, and so on, so how does the optimizer determine these? And why do you have to be the leftmost match, what is the principle of the leftmost match, do you have a deeper understanding?

In this article, we use some examples to analyze what the optimizer does, so that we can better optimize the SQL query.

This article you can know:

  1. What is the access path to SQL

  2. How the optimizer determines the optimal access path

  3. What is the principle of the leftmost match?

  4. How to efficiently evaluate the number of SQL hit rows

Example table:

CREATE TABLE Test (?  ID Int (one) not NULL auto_increment,?  User_name varchar (+) DEFAULT NULL,?  sex int (one) DEFAULT NULL,?  Age int (one) DEFAULT NULL,?  c_date datetime DEFAULT NULL,?  PRIMARY KEY (ID),?  # index?  KEY Id_name_sex (id,user_name,sex),?  KEY name_sex_age (user_name,sex,age)?) Engine=innodb auto_increment=12 DEFAULT Charset=utf8;

  

?

One, Access path

Before the SQL statement can be really executed, the optimizer must first determine how to access the data. This includes: Which index should be used, how the index is accessed, whether an auxiliary random read is required, and so on.

From a SQL, to optimizer optimization, to the engine for data query, landing to the data storage page, this is an access path determination process.

Second, predicate

Predicates are one or more of the search parameters in the WHERE clause that we often say. predicate expressions are the primary point of entry for index design, and if an index satisfies all the predicate expressions of a SELECT query statement, the optimizer may establish an efficient access path.

SELECT * FROM test where ID =1 and user_name like ' test% '

  

For example, in the above query, the search parameters, IDs, and user_name that follow the where are predicates.

Third, index tablets

The index slice represents the range of ranges determined by the predicate expression, and the cost of the access path is largely dependent on the thickness of the index slice.

The thicker the index slice, the more index pages you need to scan, the more index records you need to process, and the maximum overhead you need to read the tag synchronously. Conversely, a narrower index slice significantly reduces the portion of the cost of index access, while there are fewer tables to read synchronously.

Synchronous reading is a random IO operation that consumes about 10ms of time in a single read. We have a description of this in the previous article.

Like what:

Will match 5 data? Sql1:select * from Test where sex=1;? Match to 2 data? Sql2:select * from Test where sex=1 and <10;

  

Therefore, we need to use predicates to determine the thickness of the index slice, the less the range of filtered domain, the narrower the index slice thickness. So predicate must be able to match to the index, or what is the matching rule?

Iv. matching columns & filtering columns

Predicates are not necessarily all matched to the index, and can match on what we call a matching column. At this point it can participate in the definition of the index slice.

Only matching columns and filter columns can participate in the definition and filtering of the index slices, others are not.

Let's take a look at the definition of predicate match:

Check the index columns, checking the index columns from beginning to end to see the following rules:

  1. In the WHERE clause, does the column have at least one predicate that is simple enough to correspond to? If so, then this column is the matching column. If not, then the column and its subsequent index columns are non-matching columns.

  2. Whether the predicate is a range predicate, and if so, the remaining index columns are non-matching columns.

  3. For an indexed column after the last matching column, the column filters columns if you have a predicate that is simple enough to correspond to it.

1. Example
SELECT * FROM test where user_name= ' test1 ' and sex>0 and age =10

  

Discovery Indexid_name_sex

  1. Check its index column row by line ( id , user_name , sex )

  2. First check id to see if the predicate behind the where does not correspond, then the index column and subsequent index columns are non-matching columns

  3. id_name_sexend of index match, no matching column

Discovery Indexname_sex_age

  1. Check its index column row by line ( user_name , sex , age )

  2. First check user_name to see if the predicate behind the where user_name corresponds to the column that is identified as matching

  3. Check the index field sex , and find that there is a predicate corresponding to the where, which determines that this column matches the sex columns, because the predicate sex is a range predicate, the remaining index is a non-matching column.

  4. The index column age is after the last matching column sex , and there is a predicate age corresponding to it, so this column filters the columns,

With this example, we finalize the following:

  • Match index:name_sex_age

  • Matching columns: user_name ,sex

  • Filter Columns:age

We look at the next explain, and we analyze the correspondence.

2. What is the use of matching columns?

After determining the matching columns, we can know which indexes the current query will use, and which columns to match to the index, and can eventually lock in the access range of the data and save read pressure for reading the data.

Relative to the query without matching to the index, there are matching columns of the query, conditional filtering is the predecessor, and there is no matching to the index of the query, conditional filtering is back, that is, after the full table scan, and then filter the results, so the disk IO pressure is too large.

In addition, the "leftmost matching" principle is also based on matching column rules, why is the leftmost match, in addition to the principle of B-tree, there is an important reason, when checking the matching column, is to check the index column from beginning to end.

Therefore, the order of the predicates after the where is not important for the ability to match the index, but the order of the indexed columns is important.

Like what:

SELECT * FROM test where user_name= ' test1 ' and sex>0 and the age =10?select * from Test where sex>0 and User_name= ' test1 ' and ' =10?select * from Test where age =10 and User_name= ' test1 ' and sex>0

  

Can all be matched to the name_sex_age index

3. Complex predicate like predicate

If the value is%xx, then the full index scan will be selected, not participate in the index match, if it is xx%, this will participate in the index match, select the index slice scan.

Or operator

Even simple predicates, if they are manipulated by or between other predicates, are extremely difficult for the optimizer, except in the case of multi-index access, it is possible to participate in the definition of an index slice and try not to use it.

This type of predicate is very difficult for the optimizer, assuming that a predicate evaluates to false, and when no other predicate is checked to exclude a row of records from being determined.

BT predicate

For example, only the AND operator, then all simple predicates can be called the BT predicate, which is a good predicate, unless the access path is a multi-index scan, otherwise only the BT predicate can participate in the definition of the index slice.

predicate value is not determined

For example, the value of the predicate is a function, or participate in the calculation, the optimizer in the static SQL binding, each time it needs to recalculate the choice, cannot cache, consumes a lot of CPU, and can not participate in index column matching.

Five, Filter factor

The matching column determines which index columns are used, but the thickness of the index slices (that is, how many rows are expected to be accessed) is not estimated. This needs to be determined by filtering factors.

The selectivity of the predicate described by the filter factor, which is the proportion of the number of record rows in the table that satisfy the predicate condition, depends on the distribution of the column values.

1. Filter factor for a single predicate

For example, our test table has 10,000 records, the predicate user_name matches an index column, its filter factor is 0.2% (1/different user_name quantity =user_name has 500 different values in the ratio), it means that the query results will contain 20 rows of records.

SELECT * FROM test where user_name= ' test '

  

2, the filter factor of the combination predicate

When more than one predicate matches the matching column, we can deduce the combined filter factor by the filter factor of the single predicate. The general formula is:

Combination filter factor = predicate 1 Filter factor * Predicate 2 filter factor ....

such as the following query

SELECT * FROM test where user_name= ' test ' and sex=1 and age =10

Contains 3 predicates, user_name, sex, age, where user_name has 500 different values, sex has 2 different values, and age has 40 different values.

The filter factor for each predicate:

FF (user_name) =1/500*100 =0.2%

FF (Sex) =1/2*100=50%

FF (age) =1/40*100=2.5%

Combination filter Factor =0.2%*50%*2.5%=0.0025%

Through the combination of the above filter factor, the final result set can be deduced =10000*0.0025%=0.25 ~=1

After evaluating the filtering factors above, we can see that the result set that needs to be found only needs to fetch 1 rows, which has a high performance improvement to the database disk access.

It is also important for the optimizer to evaluate the filter factor before evaluating the optional access path.

Six, sort

Materialized result sets mean that a result set is built by performing the necessary database access. In the best case, only one record needs to be returned, and in the worst case, multiple records need to be returned, and a large number of disk reads are required. and sorting is one of them.

In the following cases, a fetch call only needs to materialize one record, otherwise the entire result set will need to be materialized when the result is sorted.

  • There are no sorting requirements, such as order By,group by.

  • Although it is necessary to sort the following two conditions:

  1. <!--There is an index that satisfies the ordering requirements of the result set, such as the above (Id_name_sex) or (name_sex_age)-

  2. The <!--optimizer decides to use this index in a traditional way, by accessing the first index row that satisfies the condition and reading the corresponding table row, then accessing the second index row that satisfies the condition and reading the corresponding table row, and so on. -

  3. <!--For example, when using an index (name_sex_age), select * from Test where user_name= ' test ' ORDER by sex, at this point in the index, the result set based on sex itself is ordered--

Seven, the last

The SQL Optimizer does more than just work for you, but the estimate of the size of the index slice, and the determination of the access path is the most important work of its kind, and we will continue to introduce it later.

Reprint Source: https://my.oschina.net/u/1859679/blog/1586098

Copyright statement: The content source network, the copyright belongs to the original person. Unless it is impossible to confirm, we will mark the author and source, and if there is any infringement please inform us that we shall immediately delete and apologize. Thank you.

Small series accumulated years of dry documents free of charge, including front end and testing, system architecture, high concurrency processing, optimization, etc.

What exactly does the SQL optimizer do for you?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.