Source: Drill down into the MySQL kernel
1 Problem Description
How does a SQL work in a database? I believe a lot of people will be interested in this problem. Of course, to fully describe a SQL in the database life cycle, this is a very big problem, covering the SQL lexical parsing, syntax parsing, permission checking, query optimization, SQL execution, and so on a series of steps, short space is absolutely powerless. Therefore, this article selected some of the content, but also I have always wanted to write a content, do focus on the introduction:
given a line SQL , how to extract the where conditions? What is the role of each sub-condition in the where condition in the process of SQL execution?
Through the introduction of this article, I hope that readers can better understand the impact of query conditions on SQL statements, and write better SQL statements, to understand some of the terminology, such as: MySQL 5.6 in an important optimization--index Condition pushdown, exactly push What's down?
The following contents of this article are arranged as follows:
The organization of data in relational database is briefly introduced.
Given a SQL, how to extract the Where condition;
Finally make a small summary;
2 Data organization in a relational database
In a relational database, the data organization involves two of the most basic structures: tables and indexes. The table stores the complete record, typically two forms of organization: The heap table (all records are stored out of order), or the Clustered index table (all records, sorted by the primary key of the record). The index stores a subset of the complete record, which accelerates the query speed of the record, the organization of the index, and generally the B + tree structure.
With this basic knowledge, let's create a test table, add several indexes to the table, insert a few records, and finally look at the full data organization and storage structure of the table. (Note: The following example uses the structure of the table as a heap table, which is also the form of tables used by databases such as Oracle/db2/postgresql, rather than the clustered index table used by the InnoDB engine.) In fact, the form of the table structure is not important, the most important is to understand the core of the following chapters, in any table structure is applicable)
CREATE TABLE t1 (a int primary key, b int, c int, d int, e varchar (20));
CREATE index IDX_T1_BCD on T1 (b, C, D);
INSERT into T1 values (4,3,1,1, ' d ');
INSERT into T1 values (1,1,1,1, ' a ');
INSERT into T1 values (8,8,8,8, ' H '):
INSERT into T1 values (2,2,2,2, ' B ');
INSERT into T1 values (5,2,3,5, ' e ');
INSERT into T1 values (3,3,2,2, ' C ');
INSERT into T1 values (7,4,5,5, ' G ');
INSERT into T1 values (6,6,4,4, ' f ');
The storage structure of the T1 table is shown (only the IDX_T1_BCD index and the T1 table structure are drawn, and the primary key index of the T1 table is not included):
For a simple analysis, there are [B,c,d] three fields on the IDX_T1_BCD index (note: If the InnoDB class's Clustered Index table, IDX_T1_BCD also includes the primary key a field), not including the [A,e] field. IDX_T1_BCD indexes, sorted by the B field First, the B field is the same, sorted by the C field, and so on. Records are sorted by [b,c,d] in the index, but are unordered on the heap table and are not sorted by any field.
3 Where Condition extraction for SQL
With the T1 table above, you can then make a SQL query on this table to get the data you want. For example, consider one of the following SQL:
SELECT * from t1 where B >= 2 and B < 8 and C > 1 and D! = 4 and E! = ' a ';
A relatively simple SQL, you can see at a glance where the condition is used to [b,c,d,e] four fields, and the T1 table IDX_T1_BCD Index, just use the [B,c,d] three fields, then walk IDX_T1_BCD index for conditional filtering, Should be a good choice. Next, let's abandon the idea of the database and directly consider several key questions of this SQL:
l This SQL, overwrite which range on index IDX_T1_BCD?
Start range: Record [2,2,2] is the first index entry to check. The index start lookup range is determined by B >= 2,c > 1.
End range: Record [8,8,8] is the first record that does not need to be checked, and the previous record needs to be judged. The end lookup range of the index is determined by B < 8;
2 What conditions in SQL can be filtered using index IDX_T1_BCD after determining the start and end range of the query?
Depending on SQL, the query scope [(2,2,2), (8,8,8)) of the index is fixed, and not every record in this index range satisfies the where query condition. For example: (3,1,1) does not meet the constraints of C > 1; (6,4,4) does not meet the constraints of d! = 4. The C,d column, in the index IDX_T1_BCD, can filter out the index records that do not meet the criteria.
Therefore, SQL can also use the C > 1 and D! = 4 criteria to filter the indexed records.
3 after determining the criteria in the index that can eventually be filtered out, what are the criteria that cannot be filtered by the index?
The answer to this question is obvious, E! = ' A ' is a query condition that cannot be filtered on index IDX_T1_BCD because the index does not contain the E column. The e column exists only on the heap table, in order to filter this query condition, you must return the records that already satisfy the index query criteria back to the table, fetch the e column in the table, and then use the query condition E! = ' A ' for the e column to make the final filter.
On the basis of understanding the above questions, do an abstraction that summarizes a set of extraction rules for where query conditions that are placed in all SQL statements:
All SQL of the where conditions, can be summarized as 3 Big class: Index Key (first key & last key) , Index Filter , Table Filter .
Next, let's take a detailed analysis of how the 3 categories are defined, and how to extract them.
L Index Key
The query condition used to determine the contiguous range (start range + END range) of an SQL query in the index is called the index Key. Because a range contains at least one start and one termination, index key is also split into index first key and index last key, respectively, to locate the start of the index lookup, and the termination criteria for the index query.
Index First Key
Used to determine the starting range of an index query. Extraction rule: Start with the first key value of the index, check whether it exists in the Where condition, if present and the condition is =, >=, then add the corresponding condition to the index key, continue to read the next value of the indexes, use the same extraction rule; if present and condition is > , the corresponding condition is added to index first key, and the extraction of index first key is terminated, and the extraction of index first key is terminated if it does not exist.
For the above SQL, this extraction rule is applied to extract the index first key (b >= 2, C > 1). Due to the condition of C, the extraction ends, excluding d.
Index last Key
The function of index last key is the opposite of index first key and is used to determine the end range of an index query. Extraction rule: Start with the first key value of the index, check whether it exists in the Where condition, if present and the condition is =, <=, then add the corresponding condition to index last key, continue to fetch the next key value of the index, use the same extraction rule, if it exists and the condition is <, The condition is added to index last key, and the extraction is terminated, and if not, the index last key is also terminated.
For the above SQL, apply this extraction rule, the extracted index last key is (b < 8), because it is the < symbol, so the end after extracting B.
2 Index Filter
After the extraction of the index key has been completed, we have fixed the query scope for the indexed by the Where condition, but the items in this range are not all items that satisfy the query criteria. In the SQL use case above, (3,1,1), (6,4,4) belong to the scope, but neither satisfies the query condition of SQL.
Extraction Rule for index filter: Also check that it exists in the where condition, starting with the first column of the index column: if present and where condition is only =, skip the first column to continue checking the next column of the index, and the next index column takes the same extraction rule as the first column in the index; if the Where condition is >=, >, <, <= Several of them, skip the first column of the index and add all the index related columns from the remaining where conditions to the index filter; if the where condition of the first column of the index contains =, >=, >, <, <= , all indexed related columns in this condition and the remaining where conditions are added to the index filter, and all index-related conditions are added to the index filter if the first column does not contain a query condition.
For the use case SQL above, the first column of the index contains only >=, < two conditions, so the first column can be skipped, and the remaining C, d two columns are added to the index filter. So the index filter obtained is C > 1 and D! = 4.
3 Table Filter
The Table filter is the simplest, most understandable, and the most convenient to extract. Extraction rule: All query conditions that are not part of an indexed column are categorized as table filter.
Similarly, the sql,table filter for the use case above is E! = ' a '.
3.1 Index key/index filter/table Filter Summary
The Where condition in the SQL statement, using the extraction rules above, will eventually be extracted to index key (first key & last key), Index filter and table filter.
Index first Key is used only to locate the starting range of the index, so it can be used only once when the index is in the search Path (along the root of the index B + tree until the correct leaf node position is indexed).
Index last key is used to locate the end range of the index, so for each index record that is read after the starting range, it is necessary to determine whether the range of the last key of index is exceeded, and if it exceeds, the current query ends;
The index filter is used to filter records in the index query scope that do not meet the query criteria, so each record in the index range needs to be compared with the index filter, and if the index filter is not satisfied, it is discarded and continues to read the next record in the index;
The Table filter, which is the last line of the Where condition, is used to filter records passed through the previous index, at which point the record satisfies the range of index first key and index last key, and satisfies the condition of index Filter. The return table reads the complete record, determines whether the complete record satisfies the query condition in the table filter, and, if not satisfied, skips the current record, resumes reading the next record of the index, and, if satisfied, returns the record, which satisfies all the conditions in where and can be returned to the front-end user.
4 Conclusion
After reading and understanding the above, we have a deep understanding of how the database extracts the query condition in where, and how to extract the query condition in where to Index Key,index filter,table Filter. Later in the writing of SQL statements, you can define the table, try to extract the corresponding where conditions, and the final SQL execution Plan contrast, gradually strengthen their understanding.
At the same time, we can also answer the article began to ask a question: MySQL 5.6 Introduced in the index Condition pushdown, what is the push down to the index level to filter it? By the right, the answer is index Filter. Prior to MySQL 5.6, the index filter and the table filter were not distinguished, and all indexes were recorded in the range indexed first key and index last key, the full record is read back to the table, and then returned to the MySQL server layer for filtering. After MySQL 5.6, the index filter is separated from the table filter, and the index filter is filtered to the innodb level, reducing the cost of the record interaction between the back table and the return MySQL server layer, which improves the efficiency of SQL execution.
The Where condition in SQL, extraction and application in database