Author: Sky. jian Copyright Disclaimer link: isky000.comdatabasemysql-performance-tuning-SQL | del. icio. us | Twitterit | HostedOnDreamHost you can use my discount code iMySQLer to get a discount on the index optimization of MySQL database performance optimization. This is a special topic of MySQL database performance optimization.
Author: Sky. jian Copyright Disclaimer link: http://isky000.com/database/mysql-performance-tuning-sql | del. icio. us | Twitter it | Hosted On DreamHost can get a discount through my discount code iMySQLer MySQL database performance optimization index optimization, which is a special topic of MySQL database performance optimization.
Author: Sky. Jian copyright notice
Link: http://isky000.com/database/mysql-performance-tuning-sql | del. icio. us | Twitter it |
Hosted On DreamHost can pass my discount codeIMySQLerGet discount
Index optimization for MySQL database performance optimization, which isMySQL database performance optimization topicsThe fourth article in the series:MySQL database performance optimization-SQL Optimization
Some people have reported that the previous articles are too theoretical and lack practical operation details. This article is more practical.
Note: This article is based on MySQL. A lot of content applies to other relational databases at the same time and requires some indexing knowledge.
- Optimization objectives
- Reduce IO times
I/O is always the easiest bottleneck for databases. This is determined by the database's responsibilities. More than 90% of the time spent in most database operations is occupied by I/O operations, reducing the number of I/O operations is the first priority in SQL optimization. Of course, it is also the most effective optimization method.
- Reduce CPU computing
In addition to the I/O bottleneck, you must consider optimizing the CPU computing workload in SQL optimization. Order by, group by, distinct... These are all large CPU-consuming users (these operations are basically the data comparison operations in CPU processing memory ). After a certain stage of IO optimization, reducing CPU computing becomes an important goal of SQL optimization.
- Optimization Method
- Change SQL Execution Plan
After clarifying the optimization objectives, we need to determine the methods to achieve our goals. For SQL statements, there is actually only one way to achieve the above two goals, that is, to change the SQL Execution Plan, so that he can "avoid detours" as much as possible ", try to find the data we need through various "shortcuts" to achieve the goal of "Reducing IO times" and "Reducing CPU computing ".
- Common Mistakes
- Count (1) and count (primary_key) are better than count (*)
Many people use count (1) and count (primary_key) instead of count (*) to count the number of records. They think this is a misunderstanding. In some scenarios, the performance may be worse. The database should optimize the count (*) operation.
- Count (column) and count (*) are the same
This misunderstanding is even common in many senior engineers or DBAs. Many people will take it for granted. In fact, count (column) and count (*) are completely different operations, and the meaning is completely different.
Count (column) indicates the number of records in the result set whose column fields are not empty.
Count (*) indicates the number of records in the entire result set.
- Select a, B from... Than select a, B, c from... Allows databases to access less data.
This misunderstanding mainly exists in a large number of developers, mainly because they do not know much about the database storage principles.
In fact, most relational databases store data in row mode, and data access operations are based on a fixed IO unit (called block or page, usually 4 kb, 8 KB... In most cases, multiple rows are stored in each IO unit, and each row stores all fields of the row (except for special fields such as lob ).
Therefore, we need to retrieve one or more fields. In fact, the amount of data that the database needs to access in the table is the same.
Of course, there are exceptions, that is, our query can be completed in the index, that is, when only two fields a and B are obtained, no need to return to the table, the field c is not used in the index and needs to be retrieved from the table. Under such circumstances, the IO volume of the two will be significantly different.
- Order by must be sorted.
We know that the index data is actually ordered. If the data we need is consistent with the order of an index, and our query is executed through this index, generally, the database will omit the sorting operation and directly return the data because the database knows that the data has met our sorting requirements.
In fact, using indexes to optimize SQL statements with sorting requirements is a very important optimization method.
For more information, see the Implementation Analysis of MySQL order by, the basic implementation principles of group by in MySQL, and the basic implementation principles of MySQL DISTINCT.
- In the Execution Plan, filesort will sort the disk files.
This misunderstanding does not blame us, but it is because MySQL developers have problems in terms of words. Filesort is the information displayed in the "Extra" column when we use the explain command to view an SQL Execution Plan.
In fact, as long as an SQL statement needs to be sorted, "Using filesort" will be displayed, which does not mean that there will be File Sorting operations.
Read more: Understand filesort in MySQL Explain command output. Here I will provide a more detailed introduction.
- Basic Principles
- Join as few as possible
The advantage of MySQL lies in simplicity, but it is also a disadvantage in some aspects. The MySQL optimizer is highly efficient. However, due to the limited amount of statistics, the Optimizer may be more prone to deviations during its operation. For complex multi-table Join, on the one hand, the optimizer is limited, and the effort to Join is not enough, so the performance is still a little far from that of Oracle and other relational databases. However, for simple single-table queries, this gap will be extremely small and even superior to those of database predecessors in some scenarios.
- Sort as few as possible
Sorting operations consume a large amount of CPU resources, so reducing sorting can greatly affect the SQL response time when the cache hit rate is higher than I/O capability.
For MySQL, there are multiple ways to reduce sorting, such:
- The index-based sorting method mentioned in the above misunderstanding is used for optimization.
- Reduce the number of records involved in sorting
- Do not sort data unless necessary
- Avoid select *
Many people may find this difficult to understand. Isn't the above misunderstanding saying that the number of fields in the select clause does not affect the data read?
Yes, most of the time it does not affect the IO volume, but when we still have the order by operation, the number of fields in the select clause will greatly affect our sorting efficiency, this can be described in detail in my previous article about MySQL order by implementation analysis.
In addition, the above misunderstanding does not mean that the IO volume is not affected in most cases. When our query results only need to be found in the index, it will greatly reduce the IO volume.
- Try to use join instead of subquery
Although the Join performance is not good, it has a great performance advantage compared with MySQL subqueries. MySQL's subquery Execution Plan has been a big problem. Although this problem has existed for many years, it has been common in all the stable versions that have been released and has not been significantly improved. Although the official team admitted this issue for a long time and promised to solve it as soon as possible, we have not yet seen any better version to solve this problem.
- Less or
When multiple conditions in the where clause coexist with "or", the MySQL optimizer does not solve the optimization problem of its execution plan very well, coupled with the unique SQL and Storage layered architecture of MySQL, the performance is relatively low. In many cases, union all or union is used (when necessary) to replace "or.
- Replace union all with union whenever possible
The main difference between union and union all is that the former needs to set two or more results and then perform the unique filtering operation. This will involve sorting and a large number of CPU operations, increase resource consumption and delay. So when we can confirm that there is no repeated result set or we do not care about the repeated result set, we should try to use union all instead of union.
- Filter as early as possible
This optimization strategy is most common in the index Optimization Design (placing more filters on top ).
This principle can also be used in SQL writing to optimize some Join SQL statements. For example, when we query paging data in multiple tables, we 'd better filter data in one table and split the pages, then, we can use the result set of split pages to Join other tables. This can reduce unnecessary IO operations as much as possible, greatly saving the time consumed by IO operations.
- Avoid type conversion
The "type conversion" mentioned here refers to the type conversion that occurs when the column field type in the where clause is inconsistent with the input parameter type:
- Manually convert column_name using the Conversion Function
MySQL (in fact, other databases may have the same issue) cannot use indexes. If conversion is required, the input parameters should be converted.
- Switch by the database itself
If the input data type and field type are inconsistent, and no type conversion is performed, MySQL may convert the data type by itself, or it may not be processed by the storage engine. As a result, the index cannot be used, resulting in execution plan problems.
- Optimize high-concurrency SQL statements first, instead of low execution frequency.
For destructive purposes, high-concurrency SQL statements are always larger than low-frequency ones, because if a high-concurrency SQL statement encounters a problem, it will not even give us any chance to breathe out, and the system will be overwhelmed. For some SQL statements that consume a large amount of IO and slow response, even if the response frequency is low, the system may slow down at most, but it may take at least a while, this gives us a buffer opportunity.
- Optimization from the global perspective, rather than one-sided Adjustment
SQL optimization cannot be performed on a specific SQL statement, but all the SQL statements in the system should be fully considered. Especially when you adjust the index to optimize the SQL Execution Plan, you must never ignore this issue because of small or large issues.
- Try to explain every SQL statement running in the database
To optimize SQL, you must be aware of the SQL Execution Plan to determine whether there is room for optimization and whether there is any execution plan problem. After optimizing the SQL statements running in the database for a period of time, it is obvious that there may be very few SQL statements, and most of them need to be explored, at this time, a large number of explain operations are required to collect execution plans and determine whether optimization is required.