Developers basically know that our data exists in the database (most of MySQL and Oracle, because the author is better at MySQL, so the default database is MySQL), the server through the SQL statement to the query data request to the MySQL database. After the database gets the SQL statement. What has been done? Here to introduce my personal understanding, you are welcome to comment on the district criticism.Process Overview
After MySQL gets the SQL statement, the approximate flow is as follows:
1.SQL Parser: Responsible for parsing and forwarding SQL
2. Preprocessor: Validating the parsed SQL tree
3. Query optimizer: Get an execution plan
4. Query execution engine: Get data result set
5. Put the data back to the caller.
The flowchart is as follows:Dispatcher and Cache phase
First, if the caching function of the system is turned on, SQL will determine whether it is the SELECT keyword after the SQL statement enters MySQL. If it is, then the query cache to query, if in the query cache can hit the SQL statement, then directly return the query cache query statement corresponding value value (in the cache, the query statement is a hash operation, the result as a key value, the result set of the query is value).
If the cache is hit, the query speed is fairly fast. But the query cache also has its corresponding drawbacks.
First, the server consumes a lot of memory when the cache is turned on, and secondly, the cache does not work at all, and finally, in some cases, the cache is not written to the corresponding SQL statement.scenarios where the cache is not applicable:
The cache locks are much more aggressive and have insufficient support for dynamic SQL.
The cache is the table-level lock that is made when the data is updated, and all caches related to the updated content are deleted after the update is complete. Therefore, if the table writes more, the cache is a waste of performance. If the write is particularly numerous, the cache may cause MySQL to slow down instead.When the cache is not queried:
1. Query conditions have uncertain data: such as now, Current_time and so on.
2. Cache is case sensitive, such as SELECT * from Test and select* from test will not resolve to the same SQLadditional overhead for queries:
1. Before you start, check that the cache is hit.
2. When the result is output, additional data cache operations are required.
3. When writing data, MySQL sets all caches of the corresponding table to be invalidated. When the cache memory is large, it can result in a large system consumption.Parser and preprocessing for SQL
The SQL parser resolves the corresponding SQL statement to the SQL parse tree after the command is distributed. The SQL parse tree is the syntax rule and parsing query inside MySQL itself. Verify that you are using the wrong keyword, that the SQL syntax is correct, and so on. (grammar-level errors)
After the parsing is completed, the query statement preprocessor is checked and the parse tree is valid according to MySQL rules. (whether the table exists, alias is ambiguous, etc.)Query optimizer
The query optimizer obtains the execution plan and the corresponding action is performed by the query execution engine. The query optimizer, a core module of database L, is divided into two types of CBO and Rbo.
Among them, Rbo is a rule-based optimizer. (Rbo is used in earlier versions of Oracle and is now retained, but the default is CBO.) MySQL does not have a Rbo optimizer)
These rules are hard-coded in the code of the database. The RBO will act as an execution plan based on the highest-priority rule that the input SQL statement can match to. For example, there is a rule in Rbo: Use an index in the case of an index. Then all the indexed tables will go through the index when they are executed. The biggest problem with RBO is that it determines the execution plan by fixing the rules. The number and distribution of objects involved in SQL are not considered. It is possible that the chosen rule is not an optimal execution plan.
The CBO is a cost-based optimizer (based on statistics) that chooses a minimal execution path from the target's many execution paths as the execution plan. Cost means that MySQL calculates an estimate of the consumption of the SQL statement corresponding to the IO,CPU, based on the relevant statistics. The calculation process involves indexes, tables, rows and other data, the process is more complex.
1. The query optimizer uses statistics to select an execution plan for SQL.
2.mysql has no histogram of data and cannot delete statistics manually. (Oracle has)
3. There was a query optimizer on the server, but no data and index statistics were saved. Statistical information is implemented by the storage engine, and different storage engines store different statistics.
4. Statistical information is divided into indexed statistical information and table statistics.To view statistical information
Index statistical information
Show index from table or Information_schema.statistics tables
Table statistical Information
Show table status like or Information_schema.tables tablesQuery execution engine + return data to client
After getting the execution plan, according to the existing execution plan, the query execution engine, the SQL layer of MySQL, call the interface of the storage engine layer layer, obtain the corresponding result set from the MySQL storage engine, and then return to the user.
When execution is complete, the result is returned to the client, and if it is a query statement and the cache is turned on, MySQL will put the result set in the query cache at the same time. It then returns the result set that is found. If the operation is deleted or modified, the number of rows affected after the execution of the statement is returned.