1. A question describes how an SQL statement is executed in a database? I believe many people will be interested in this issue. Of course, it is a huge problem to fully describe the lifecycle of an SQL statement in the database, it covers a series of steps such as SQL lexical parsing, Syntax Parsing, permission check, query optimization, and SQL Execution.
1. A question describes how an SQL statement is executed in a database? I believe many people will be interested in this issue. Of course, it is a huge problem to fully describe the lifecycle of an SQL statement in the database, it covers a series of steps such as SQL lexical parsing, Syntax Parsing, permission check, query optimization, and SQL Execution.
1. Problem Description
How is an SQL statement executed in a database? I believe many people will be interested in this issue. Of course, it is a huge problem to fully describe the lifecycle of an SQL statement in the database, it covers a series of steps such as SQL lexical parsing, Syntax Parsing, permission check, query optimization, and SQL Execution. The brief section is absolutely powerless. Therefore, this article selects some of the content, which I have always wanted to write and focuses on:
How to extract the where condition for a given SQL statement? What are the roles of each sub-condition in the where condition in SQL Execution?
Through the introduction in this article, we hope that readers can better understand the impact of query conditions on SQL statements, write better SQL statements, and better understand some terms, such: index Condition Pushdown is an important Optimization in MySQL 5.6. What is push down?
The subsequent content of this article is as follows:
- Briefly introduces the data organization form in relational databases;
- How to extract the where condition for a given SQL statement;
- Finally, let's make a small summary;
2. Data Organization in a relational database
In relational databases, data organization involves two basic structures: Tables and indexes. A table stores complete records. Generally, there are two forms of organization: heap table (unordered storage of all records) or clustered index table (all records, storage by record primary key ). The index stores a subset of complete records, which is used to accelerate the query speed of records. The index structure is generally B + tree structure.
With this basic knowledge, let's create a test table, add several indexes to the table, insert several records, and finally look at the complete data organization and storage structure of the table. (Note: The following example uses the heap table structure, which is also the table organization form used by databases such as Oracle, DB2, and PostgreSQL, instead of the clustered index table used by the InnoDB engine. In fact, the form of the table structure is not important. The most important thing is to understand the core of the following chapter and apply it to any table structure)
Create table t1 (a int primary key, B int, c int, d int, e varchar (20 ));
Create index idx_t1_bcd on t1 (B, c, d );
Insert into t1 values (, 'd ');
Insert into t1 values (1, 1, 1, 1, 'A ');
Insert into t1 values (8, 8, 8, 'H '):
Insert into t1 values (2, 2, 2, 'B ');
Insert into t1 values (5, 2, 3, 5, 'E ');
Insert into t1 values (3,3, 2,2, 'C ');
Insert into t1 values (7, 4, 5, 'G ');
Insert into t1 values (6, 6, 4, 4, 'F ');
The storage structure of table t1 is shown in (only the idx_t1_bcd index and table t1 structure are drawn, and the primary key index of table t1 is not included ):
For a brief analysis, the idx_t1_bcd index has three fields: [B, c, d] (Note: if it is an InnoDB-type clustered index table, idx_t1_bcd will also contain the primary key a field ), it does not include the [a, e] field. Idx_t1_bcd index. First, sort by B field. If B field is the same, sort by c field, and so on. Records are sorted by [B, c, d] in the index, but are in disorder in the heap table and not by any field.
3. SQL where condition extraction
After the preceding table t1 is available, you can perform SQL queries on the table to obtain the desired data. For example, consider one of the following SQL statements:
Select * from t1 where B> = 2 and B <8 and c> 1 and d! = 4 and e! = 'A ';
A simple SQL statement clearly shows that the where condition uses four fields: [B, c, d, e], while the idx_t1_bcd index of table t1 exactly uses [B, c, d] these three fields, it should be a good choice to use the idx_t1_bcd index for conditional filtering. Next, let's discard the database idea and directly think about several key issues of this SQL statement:
L which range of index idx_t1_bcd is covered by this SQL statement?
Start range: Record [, 2] is the first index to be checked. The index start search range is determined by B> = 2, c> 1.
Termination range: records [8, 8] are the first records that do not need to be checked, and previous records must be judged. The search range for the end of the index is determined by B <8;
2. After determining the start and end range of the query, which of the following conditions can be used to filter the SQL statements using the index idx_t1_bcd?
According to the SQL statement, after the index query range [(, 2), (, 8) is fixed, not every record in the index range meets the where query conditions. For example: (3, 1, 1) do not meet the constraints of c> 1; (6, 4, 4) do not meet d! = 4 constraints. Columns c and d can filter out index records that do not meet the criteria in the index idx_t1_bcd.
Therefore, you can use c> 1 and d in SQL! = 4 Conditions: Filter index records.
3. After determining the final criteria that can be filtered out by the index, which of the following conditions cannot be filtered by the index?
The answer to this question is obvious, e! = 'A' the query condition cannot be filtered on the index idx_t1_bcd because the index does not contain the e column. Column e only exists in the heap table. to filter this query condition, you must return records that meet the index query conditions to the table and retrieve column e from the table, then use column e's query condition e! = 'A' for final filtering.
Based on understanding the answers to the above questions, make an abstraction to summarize a set of Extraction Rules for where query conditions that are accurate for all SQL statements:
All SQL where conditions can be classified into three categories: Index Key (First Key & Last Key), Index Filter, and Table Filter.
Next, let's take a look at how analysts define and extract the three categories.
L Index Key
The query condition used to determine the continuous range (start range + end range) of the SQL query in the Index, which is called the Index Key. Because a range contains at least one start and end, the Index Key is also split into Index First Key and Index Last Key, which are used to locate the start of Index search respectively, and the condition for terminating the index query.
Index First Key
Used to determine the start range of the index query. Extraction Rules: starting from the first key value of the index, check whether the index exists in the where condition. If the index exists and the condition is =,> =, add the corresponding condition to Index First Key, continue to read the next Key value of the Index, and use the same extraction rule. If yes, then, the corresponding condition is added to the Index First Key and the extraction of the Index First Key is terminated. If the condition does not exist, the extraction of the Index First Key is also terminated.
Apply this extraction rule to the preceding SQL statement. The extracted Index First Key is (B> = 2, c> 1 ). Because the condition of c is>, the extraction ends, excluding d.
Index Last Key
The function of Index Last Key is opposite to that of Index First Key, which is used to determine the end range of Index query. Extraction Rules: starting from the first key value of the index, check whether the index exists in the where condition. If the index exists and the condition is =, <=, add the corresponding condition to the Last Key of the Index, continue to extract the next Key value of the Index, and use the same extraction rule. If the condition exists and is <, the condition is added to the Index Last Key and the extraction is terminated. If the condition does not exist, the extraction of the Index Last Key is also terminated.
Apply this extraction rule to the preceding SQL statement. The Last Key of the extracted Index is (B <8). Because it is the <symbol, B is extracted and ended.
2 Index Filter
After the Index Key is extracted, we fixed the Index query range based on the where condition. However, the items in this range do not all meet the query conditions. In the preceding SQL example, (, 1), (, 4) are in the range, but none of them meet the SQL query conditions.
Index Filter Extraction Rules: Check whether the Index exists in the where condition from the first column of the Index column. If the where condition exists, skip the first column and continue to check the next column of the index. The next index column adopts the same Extraction Rules as the first column of the index; if the where condition is> =,>, <, <=, the first column of the Index is skipped, and all the Index-related columns in the other where conditions are added to the Index Filter; if the where condition in the first column of the index contains conditions other than =, >=,>, <, and <=, add all Index-related columns in this condition and other where conditions to the Index Filter. If the first column does not contain the query conditions, all Index-related conditions are added to the Index Filter.
For the preceding SQL case, the first column of the Index only contains the conditions >=and <. Therefore, the first column can be skipped and the remaining columns c and d can be added to the Index Filter. Therefore, the obtained Index Filter is c> 1 and d! = 4.
3 Table Filter
Table Filter is the simplest, easiest, and most convenient to extract. Extraction Rules: All query conditions that do not belong to the index column are classified as Table Filter.
Similarly, for the preceding SQL case, the Table Filter is e! = 'A '.
3.1 Index Key/Index Filter/Table Filter Summary
The where condition in the SQL statement uses the preceding extraction rules and is finally extracted to Index Key (First Key & Last Key), Index Filter, and Table Filter.
Index First Key is only used to locate the starting range of the Index. Therefore, it is only used for the First Search Path of the Index (the root node of the Index B + tree is traversed all the time to the correct leaf node position of the Index) and can be used at a time;
The Index Last Key is used to locate the Index termination range. Therefore, for each Index record read after the start range, you must determine whether the range of the Index Last Key has exceeded. If the range is exceeded, the current query ends;
Index Filter is used to Filter records in the Index query range that do not meet the query conditions. Therefore, each record in the Index range must be compared with the Index Filter, if the Index Filter is not met, the system directly discards it and continues to read the next record of the Index;
Table Filter is the Last line of defense for the where condition. It is used to Filter records that pass the tests of the previous Index. At this time, the records meet the range of Index First Key and Index Last Key, in addition, when the Index Filter conditions are met, the full record is read back to the Table to determine whether the full record meets the query conditions in the Table Filter. Similarly, if not, the current record is skipped, continue to read the next record of the index. If yes, the record is returned. This record meets all the where conditions and can be returned to the front-end user.
4 Conclusion
After reading and understanding the above content, I will explain in detail how the database extracts the query conditions in the where clause and how to extract the query conditions in the where clause as Index Key, Index Filter, table Filter has a deep understanding. Later, when writing SQL statements, you can compare the definition of the table and try to extract the corresponding where condition, compare it with the final SQL Execution Plan, and gradually strengthen your understanding.
At the same time, we can also answer a question raised at the beginning of the article: Index Condition Pushdown introduced in MySQL 5.6, what is Push Down to the Index level for filtering? By the way, the answer is Index Filter. Before MySQL 5.6, indexes in the range of Index First Key and Index Last Key are read back to the Table without distinguishing between Index Filter and Table Filter, then return to the MySQL Server layer for filtering. After MySQL 5.6, the Index Filter and Table Filter are separated, and the Index Filter drops to the InnoDB Index layer for filtering, reducing the overhead of record interaction between the returned Table and the returned MySQL Server layer, improves SQL Execution efficiency.