First, query optimizer module
The task of the query optimizer is to find the best solution for executing SQL queries. Most query optimizers are either rule-based or cost-based.
Most query optimizers, including
MySQL's query optimizer, always more or less search for the best solution among all possible query evaluation solutions.
The
MySQL Query Optimizer in MySQL is the core of the optimizer. When MySQL data gets a Query statement, it will be handed over to the Query Optimizer to analyze and generate an optimal execution plan (this is considered by the Optimizer to be optimal, but not necessarily The truly optimal is the same as the Oracle database will estimate the wrong rows).
Then the database executes the query statement according to the execution plan.
In the entire execution of SQL statements, Optimizer is the most time-consuming, but there are also third-party tools that bypass MySQL's Query Optimizer module in order to improve performance, such as handlersocket.
For multi-table associative queries, the number of possible solutions that the MySQL optimizer can query increases exponentially with the number of tables referenced in the query. For small numbers of tables, this is not a problem.
However, when the submitted query requires a large result set, the time spent on query optimization can easily become the bottleneck of server performance.
A more flexible solution for query optimization allows the user to control the optimizer to search for the best query evaluation solution in detail. The general idea is that the fewer options are investigated, the less time it takes to compile a query.
In addition, because the optimizer skips some solutions, it may miss an optimal solution. The optimizer's behavior regarding the evaluation of the number of solutions can be controlled by two system variables:
The optimizer_prune_level variable tells the optimizer to skip some schemes based on estimates of the number of rows accessed in each table. Our experiments show that this type of "educated guess" rarely misses the best solution, and can greatly reduce the number of query edits.
This is why the option is on (optimizer_prune_level=1) by default.
However, if you think that the optimizer missed a better query plan, this option can be turned off (optimizer_prune_level=0). The risk is that query editing takes longer.
Please note that even with this heuristic, the optimizer can still detect an exponential number of solutions.
The timizer_search_depth variable tells the optimizer how deep it should look for each unfinished "future" solution to assess whether it should be expanded further.
A smaller optimizer_search_depth value will greatly reduce the number of query edits.
For example, if optimizer_search_depth is close to the number of tables in the query, queries on 12, 13 or more tables are likely to take hours or even days to compile.
At the same time, if you edit with optimizer_search_depth equal to 3 or 4, the compiler compilation time can be less than 1 minute for the same query.
If a reasonable optimizer_search_depth value cannot be determined, this variable can be set to 0 to tell the optimizer to automatically determine this value.
Second, the basic idea of query optimization
No matter whether you are doing project design or product design, you need to have ideas before you can avoid some problems.
Of course, MySQL query optimization also requires R&D or the DBA to have some ideas. Only ideas to guide writing will be more reasonable.
1. Optimize Query statements that need to be optimized
Query statements with high concurrency should be optimized to avoid high concurrency. The application is stuck due to SQL, such as a large amount of waiting for php-fpm, and a highly concurrent Query statement, if the execution plan is wrong, originally only need to scan a few hundred As a result, scanning several million rows may have catastrophic consequences and even cause business lag, especially high-concurrency Query statements in the core business.
2. View the execution plan and adjust the Query statement
According to explain extended SQL analysis of the query statement, you can view the execution plan. At this time, you need to pay attention to some elements in the execution plan:
id: serialization of the query
select type
depent subquery: indicates that the query is the first Select in the subquery, depending on the result set of the external query
PRIMARY: The outermost query of the subquery, note that it is not the primary key query
simple: queries other than subqueries or UNION
table: The name of the access data table. The person who writes SQL needs to know whether this table is a core table, whether it is a large data scale, etc.
type scanning method
all: full table scan
const: Read constants, and at most only one record matches. Because it is a constant, you only need to read it once
index: full index scan
eq_ref: There is only one matching result at most, accessed through the primary key and unique index
range: index range scan
possible_keys: What indexes can be used by the query
key: The optimizer module chooses which index to use. If there is an index, it may not be used. Only by looking at the execution plan can we know which index is used.
key_len: index length
rows: number of rows returned
extra: additional information, such as using filesort---> indicates that the sorting algorithm is used
filtered: The column gives a percentage value. This percentage value and the value of the rows column can be used together to estimate the number of rows that will be connected to the previous table in QEP. The previous table refers to the table whose id column value is smaller than the id of the current table. This column only appears in the EXPLAIN EXTENDED statement.
3. Learn to view performance loss (cpu consumption, io consumption)
When you find a slow Query statement, you need to locate which is slow, CPU or IO, etc.:
mysql>set profiling=1;mysql>show profiles;mysql>show profile cpu,block io for query n
Third, the basic principles of inquiry
1. Always use small result sets to drive large result sets
When doing a join query, the driving table must be a table with fewer records after the conditions are limited.
MySQL join has only one algorithm, nested loop, which is the for loop in the program. It is implemented through nested loops. The larger the driven result set, the more loops and the more the driven table is accessed. Reduce IO while reducing CPU.
2. Query only the required columns
Only query the required columns can reduce IO, and the columns are also related to the sorting algorithm.
3. Use only the most effective filter conditions
The premise is to use a condition to query the result. Use b to query the result. A and b both use the query to get the result. The results are the same for these three times.
Whether to use a condition or b condition, or both conditions is limited, only depends on the execution plan.
4. Try to avoid complicated joins and subqueries
5. Try to sort and query on the index column
Sorting on the index column: The index column is sorted, and no additional sorting algorithm is required to reduce CPU consumption.
Query on the index column: reduced IO consumption
Create an index, the optimizer module may not be used, but you can add force index to the SQL (force that index)
Fourth. Index pros and cons and index classification
Everything has pros and cons. The appearance of one thing, for example, will be good or bad in different scenarios, it depends on how to weigh it.
benefit:
Querying data through index columns can improve the efficiency of data retrieval and reduce the IO cost of the database.
Sorting data by index column reduces the cost of data sorting and reduces CPU consumption.
harm:
Assuming that table a has column column ca, create index indxaca for it:
Every time you update ca, you will adjust the index information after the key value changes brought about by the update, which will increase IO loss. The index column also takes up space. As the data in column a increases, the space occupied by the indxaca index is also Will continue to grow. Therefore, the index will also bring about the consumption of storage space resources.