Optimization objectives
Reduce IO times
Io is always the easiest place to bottleneck the database, which is determined by the responsibility of the database, most of the database operations over 90% of the time is occupied by IO operations, reducing IO times is the first priority in SQL optimization, of course, is the most obvious optimization method.
Reduce CPU Computing
In addition to IO bottlenecks, SQL optimization needs to be considered in the optimization of CPU operations. Order BY, group By,distinct ... are large CPU-consuming (these operations are essentially CPU-processed data comparison operations in memory). When our IO optimization is at a certain stage, reducing CPU computing becomes an important goal of our SQL optimization
Optimization method
Change the SQL execution plan
After defining the goal of optimization, we need to identify ways to achieve our goals. For SQL statements, there is only one way to achieve the above 2 goals, that is, to change the execution plan of SQL, let him try "less detours", as far as possible through a variety of "shortcuts" to find the data we need to achieve "reduce IO times" and "reduce CPU calculation" goal
Common Mistakes
Count (1) and COUNT (Primary_key) are better than count (*)
Many people use COUNT (1) and COUNT (Primary_key) instead of Count (*) in order to count the number of records, and they think this is a better performance, but it's a myth. For some scenarios, this is a much less likely possibility, and some special optimizations should be made for the count (*) counting operation of the database.
Count (column) and COUNT (*) are the same
This myth is even prevalent in many senior engineers or DBAs, and many people will take it for granted. In fact, Count (column) and COUNT (*) are completely different operations and represent a completely different meaning.
Count (column) is a record that indicates how many column fields in the result set are not empty
COUNT (*) is the number of records that represent the entire result set
Select A,b from ... than select A,b,c from ... Allows the database to access less data
This myth is mainly in a large number of developers, mainly because of the storage principle of the database is not too understanding.
In fact, most relational databases are stored in rows (row), and data access operations are in a fixed sized IO unit (called block or page), typically 4kb,8kb ... Most of the time, multiple rows are stored in each IO cell, and each row is all the fields that store the row (except for special type fields such as LOBs).
So, whether we take a field or multiple fields, the amount of data that the database needs to access in the table is actually the same.
Of course, there are exceptions, and that is, our query can be done in the index, which means that when we take only two fields, we do not need to return the table, and the C field is not used in the index, the a,b need to return to the table to obtain its data. In this case, the IO volume of the two will vary considerably.
Ordering by must require sort operations
We know that the index data is actually orderly, if the data we need is the same as the order of an index, and our query is executed by this index, then the database will normally omit the sort operation and return the data directly, because the database knows that the data already satisfies our sorting requirements.
In fact, using indexes to optimize SQL with sorted requirements is a very important optimization method.
Extended reading: MySQL order by implementation analysis, MySQL in the GROUP by the basic principle of implementation and MySQL DISTINCT Basic principles of implementation of these 3 articles have a more in-depth analysis, especially the first
Disk files are sorted by Filesort in the execution plan
There is no reason to blame us for this myth, but for MySQL developers in terms of the problem. Filesort is the information that we might see in the "Extra" column when we use the explain command to view a SQL execution plan.
In fact, a "Using filesort" is displayed whenever an SQL statement needs to be sorted, which does not mean that there will be a file sort operation.
Extended reading: Understanding the Filesort in the MySQL Explain command output, I have a more detailed description here
Basic principles
As little as possible join
The advantage of MySQL is simplicity, but in some ways it is a disadvantage. The MySQL optimizer is highly efficient, but because of its limited statistics, the optimizer's work process is more likely to deviate. For complex multiple table joins, on the one hand because of its optimizer limited, and also in the Join this aspect of the effort is not enough, so performance from Oracle and other relational database predecessors still have a certain distance. However, if it is a simple single table query, this gap will be very small even in some scenarios to better than these database predecessors.
Sort as little as possible
The sort operation consumes more CPU resources, so reducing the sort can significantly affect the response time of the SQL in scenarios where the cache hit ratio is high IO capability is sufficient.
For MySQL, there are several ways to reduce sorting, such as:
The above myths mentioned in the use of the index to sort by the way to optimize
Reduce the number of record bars involved in sorting
It's not necessary to sort the data
...
Try to avoid a select *
Many people see this point after find it more difficult to understand, it is not in the misunderstanding that the number of fields in the SELECT clause does not affect the read data?
Yes, most of the time it doesn't affect the IO amount, but when we still have an order by operation, the number of fields in the SELECT clause affects our sorting efficiency to a large extent, which is more detailed in my previous article about the implementation analysis of the MySQL orders by. A fine introduction.
In addition, the above misunderstanding is not also said, but most of the time will not affect IO, when our query results only need to be found in the index, or will greatly reduce the IO volume.
Try to use a join instead of a subquery
While the Join performance is not good, it has a very high performance advantage over the MySQL subquery. MySQL's subquery execution plan has always been a big problem, and although it has been around for years, it has been widespread in all stable versions that have been released and has not improved much. Although the authorities have acknowledged the issue very early and promised to resolve it as soon as possible, at least we haven't seen a better solution to the problem so far.
As little as possible or
When there are multiple conditions in the WHERE clause to "or" coexist, the MySQL optimizer does not solve its execution plan optimization problem, coupled with MySQL-specific SQL and Storage layered architecture, resulting in low performance, many times using union ALL or U The nion (the necessary) way to replace "or" will get better results.
Try to use UNION all instead of union.
The difference between Union and union all is that the former need to merge two (or more) result sets and then perform a unique filtering operation, which involves sorting, adding a lot of CPU operations, increasing resource consumption and latency. So when we can confirm that there is no way to duplicate the result set or do not care about repeating the result set, try to use union all instead of union.
Filter as early as possible
This optimization strategy is most commonly found in the optimal design of the index (putting more filter-better fields forward).
This principle can also be used in SQL authoring to optimize some Join sql. For example, when we are querying for paging data in multiple tables, we had better be able to filter the data on a table first, and then use the result set of the split page with another table Join, so that as much as possible to reduce unnecessary IO operations, greatly reducing the time spent on IO operations.
Avoid type conversions
"Type conversion" here refers to a type conversion that occurs when the type of column field in the WHERE clause is inconsistent with the incoming parameter type:
The conversion function is column_name on the human
directly causing MySQL (in fact, other databases will have the same problem) cannot use the index, and if you do not want to convert, you should convert it on the parameters passed in
Conversion by the database itself
If we pass in inconsistent data types and field types, at the same time, we do not do any type conversion processing, MySQL may itself to our data conversion operations, or may not be processed by the storage engine to deal with, so that the index can not use the situation caused by the execution plan problems.
Prioritize high concurrency SQL rather than perform low frequency for certain "big" sql
For a disruptive, high concurrency SQL will always be larger than the low frequency, because the high concurrency of SQL will not even give us any respite when it comes to the problem. For some SQL, which consumes a lot of IO and is slow to respond to, because of the low frequency, even if encountered, the most is to allow the entire system to slow down, but at least it may be a while, let us have the opportunity to buffer.
Optimization from a global perspective rather than a one-sided adjustment
SQL optimization cannot be done individually for one, but should take full account of all the SQL in the system, especially when optimizing the execution plan for SQL by tuning the index.
Explain every SQL that runs in the database as much as possible
To optimize SQL, you need to be aware of the SQL execution plan to determine whether there is room for optimization to determine whether there is an execution plan problem. After a period of optimization of the SQL running in the database, it is clear that the problem SQL may have been very small, most of which need to be explored, this time requires a large number of explain operations to collect execution plans and determine whether the need for optimization.
Above we are talking about the most is the sort, column selection, join, union and other such queries are also our common statements, just these seemingly simple and most likely to cause our MySQL server performance degradation is likely to be suspended animation.