MySQL SQL optimization

Source: Internet
Author: User

Objective
Some of the feedback before several articles too theoretical lack of practical details, this article on more operability of the content bar.
Note: This article is based on the background of MySQL, many of which are applicable to other relational databases and need some index knowledge.

Optimize your Goals
1. Reduce the number of IO
Io is always the most vulnerable to the database, which is determined by the responsibility of the database, most of the database operations more than 90% of the time is occupied by IO operations, reduce the number of IO is the first priority in SQL optimization, of course, is the most obvious optimization means.

2. Reduce CPU calculation
In addition to the IO bottleneck, the optimization of the CPU is what needs to be considered in SQL optimization. Order BY, group By,distinct ... Are all CPU-intensive (these operations are basically CPU-processed in-memory data comparison operations). When our IO optimizations are in a certain phase, reducing CPU computing becomes an important goal of our SQL optimization

Optimization method
Changing the SQL execution plan
Once we have identified the optimization goals, we need to identify ways to achieve our goals. For SQL statements, there is only one way to achieve the above 2 goals, that is to change the SQL execution plan, let him try to "less detours", as far as possible through a variety of "shortcuts" to find the data we need to achieve "reduce the number of IO" and "reduce CPU computing" target

Basic principles
1. Join as little as possible
The advantage of MySQL is simplicity, but it's also a disadvantage in some ways. The MySQL optimizer is efficient, but because of its limited amount of statistical information, the optimizer is more likely to deviate from the work process. For a complex multi-table Join, on the one hand due to its optimizer constraints, and also in the Join this aspect of the effort is not enough, so performance from the Oracle and other relational database predecessors still have a certain distance. But if it is a simple single-table query, this gap will be very small even in some scenarios to better than these database predecessors.

2. Sort as few as possible
Sorting operations consume more CPU resources, so reducing the ordering can significantly affect SQL response time in scenarios where the cache hit ratio is high enough for the IO capability.
For MySQL, there are several ways to reduce sorting, such as:
The above myths mentioned in the way of sorting by using the index to optimize
Reduce the number of record entries in the sort
No need to sort data
...

3. Try to avoid select *
Many people find it difficult to understand this point, above is not in the misunderstanding just said that the number of fields in the SELECT clause does not affect the read data?
Yes, most of the time it does not affect the IO volume, but when we have an order by operation, the number of fields in the SELECT clause will largely affect our sorting efficiency.

In addition, the above error is not also said, but most of the time will not affect the IO volume, when our query results only need to be found in the index, it will greatly reduce the amount of IO.

4. Try to use join instead of subquery
While Join performance is poor, there is a significant performance advantage over MySQL subqueries. MySQL's sub-query execution plan has been a big problem, although this problem has existed for many years, but has been released by all the stable version of the widespread, has not been much improved. While the authorities have recognized this issue early and pledged to resolve it as soon as possible, at least we have not yet seen which version of the issue has been better solved.

5. As little or as
When there are multiple conditions in the WHERE clause to "or" coexist, the MySQL optimizer does not have a good solution to its execution plan optimization problem, coupled with MySQL-specific SQL and Storage layered architecture, resulting in poor performance, often using union ALL or U Nion (when necessary) in lieu of "or" will have a better effect.

6. Try to use UNION all instead of union
The difference between Union and union all is that the former needs to merge two (or more) result sets and then perform a unique filtering operation, which involves sorting, adding a lot of CPU operations, and increasing resource consumption and latency. So when we can confirm that it is not possible to duplicate a result set or do not care about repeating the result set, try to use union all instead of union.

7. Filter as early as possible
This optimization strategy is most commonly found in the optimal design of the index (better filter fields are put forward).
This principle can also be used in SQL authoring to optimize some of the Join's SQL. For example, when we are querying multiple tables for paging data, we'd better be able to filter the good data on a single table, and then join with another table with the result set of the page, so as much as possible to reduce unnecessary IO operations, greatly saving the time spent in IO operations.

8. Avoid type conversions
The "type conversion" here refers to the type conversion that occurs when the type of the column field in the WHERE clause is inconsistent with the passed parameter type:
Conversion of human beings on column_name by conversion functions
Directly leads to MySQL (in fact other databases have the same problem) cannot use the index, if not to convert, should be converted on the parameters passed in
Converted by the database itself
If our incoming data types and field types are inconsistent, and we do not have any type conversion processing, MySQL may either make a type conversion operation on our data or leave it to the storage engine to process it, which will cause the execution plan problem if the index is not available.

9. Prioritize high-concurrency SQL rather than low-frequency execution some "big" sql
For the destructive, high concurrency SQL will always be larger than the low frequency, because the high concurrency of SQL once there is a problem, not even give us any respite to the system will be compressed. And for some, although the need to consume a lot of IO and slow response to SQL, because of the low frequency, even if encountered, the most is to let the whole system to respond slowly, but at least for a while, let us have the opportunity to buffer.

10. Optimize from a global perspective, rather than one-sided adjustment
SQL optimization cannot be done separately for one, but should take full account of all SQL in the system, especially when optimizing SQL's execution plan by tuning the index, it must not be forgotten how, pound foolish.

11. Explain the SQL that runs in the database whenever possible
To optimize SQL, you need to be aware of the SQL execution plan to determine if there is room for optimization to determine if there is an execution plan problem. After a period of optimization of the SQL running in the database, it is obvious that SQL may have been scarce, and most of them need to be explored, when a large number of explain operations are needed to collect the execution plan and determine whether optimization is needed.

Common pitfalls
1.count (1) and count (Primary_key) better than COUNT (*)
Many people use COUNT (1) and COUNT (*) instead of COUNT (*) in order to count the number of records, and they think this is a better performance, in fact this is a primary_key. For some scenarios, this is likely to be worse, and some special optimizations should be made for the count (*) count operation of the database.

2.count (column) and COUNT (*) are the same
This myth is common even among many senior engineers or DBAs, and many people will take it for granted. In fact, Count (column) and COUNT (*) are a completely different operation and represent a completely different meaning.
Count (column) is a record that indicates how many column fields in the result set are not empty
COUNT (*) is a representation of how many records are in the entire result set

3.select A, b from ... than select A,b,c from ... Allows database access to a smaller amount of data
This misconception exists mainly in a large number of developers, the main reason is that the database storage principle is not too understanding.
In fact, most relational databases are stored as rows (row), and data access operations are in a fixed-size IO unit (called block or page), typically 4kb,8kb ... Most of the time, multiple rows are stored in each IO unit, and each row is all the fields that store the row (except for special types of fields such as lobs).
So, whether we are taking a field or multiple fields, the amount of data that the database needs to access in the table is actually the same.
Of course, there are exceptions, that is, our query can be done in the index, that is, when only a A, a, a, two fields, do not need to return to the table, and C This field is not used in the index, you need to return to the table to get its data. In such cases, the IO volume of the two will be significantly different.

4.order by must have a sort operation
We know that the index data is actually orderly, and if the data we need is consistent with the order of an index, and our query is executed by this index, then the database will generally omit the sort operation and return the data directly, because the database knows that the data has already met our sorting requirements.
In fact, using indexes to optimize SQL with ordered requirements is a very important optimization method.
Read more: MySQL ORDER by implementation analysis, MySQL GROUP by basic implementation principles and MySQL DISTINCT Basic implementation of the principles of the 3 articles have more in-depth analysis, especially the first article

5. Filesort in the execution plan will be the disk file sorting
This is not really a mistake to blame us, but because the MySQL developers in terms of the problem. Filesort is the information that we may see in the "Extra" column when we use the explain command to view a SQL execution plan.
In fact, as long as an SQL statement requires a sort operation, a "Using filesort" is displayed, which does not mean that there will be a file sort operation.

Transferred from: http://blog.csdn.net/beckren/article/details/48424765

MySQL SQL optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.