Uncover SQL optimization techniques to improve database performance

Source: Internet
Author: User
This article is based on MySQL. Many content applies to other relational databases at the same time and requires some indexing knowledge. It focuses on how to optimize SQL to improve database performance.

This article is based on MySQL. Many content applies to other relational databases at the same time and requires some indexing knowledge. It focuses on how to optimize SQL to improve database performance.

Optimization objectives

1. Reduce IO times

I/O is always the easiest bottleneck for databases. This is determined by the database's responsibilities. More than 90% of the time spent in most database operations is occupied by I/O operations, reducing the number of I/O operations is the first priority in SQL optimization. Of course, it is also the most effective optimization method.

2. Reduce CPU computing

In addition to the I/O bottleneck, you must consider optimizing the CPU computing workload in SQL optimization. Order by, group by, distinct... These are all large CPU-consuming users (these operations are basically the data comparison operations in CPU processing memory ). After a certain stage of IO optimization, reducing CPU computing becomes an important goal of SQL optimization.

Optimization Method

1. Change the SQL Execution Plan

After clarifying the optimization objectives, we need to determine the methods to achieve our goals. For SQL statements, there is actually only one way to achieve the above two goals, that is, to change the SQL Execution Plan, so that he can "avoid detours" as much as possible ", try to find the data we need through various "shortcuts" to achieve the goal of "Reducing IO times" and "Reducing CPU computing ".

Common Mistakes

1. count (1) and count (primary_key) are better than count (*)

Many people use count (1) and count (primary_key) instead of count (*) to count the number of records. They think this is a misunderstanding. In some scenarios, the performance may be worse. The database should optimize the count (*) operation.

2. count (column) and count (*) are the same.

This misunderstanding is even common in many senior engineers or DBAs. Many people will take it for granted. In fact, count (column) and count (*) are completely different operations, and the meaning is completely different.

Count (column) indicates the number of records in the result set whose column fields are not empty.

Count (*) indicates the number of records in the entire result set.

3. select a, B from... Than select a, B, c from... Allows databases to access less data.

This misunderstanding mainly exists in a large number of developers, mainly because they do not know much about the database storage principles.

In fact, most relational databases store data in row mode, and data access operations are based on a fixed IO unit (called block or page, usually 4 kb, 8 KB... In most cases, multiple rows are stored in each IO unit, and each row stores all fields of the row (except for special fields such as lob ).

Therefore, we need to retrieve one or more fields. In fact, the amount of data that the database needs to access in the table is the same.

Of course, there are exceptions, that is, our query can be completed in the index, that is, when only two fields a and B are obtained, no need to return to the table, the field c is not used in the index and needs to be retrieved from the table. Under such circumstances, the IO volume of the two will be significantly different.

4. order by must be sorted.

We know that the index data is actually ordered. If the data we need is consistent with the order of an index, and our query is executed through this index, generally, the database will omit the sorting operation and directly return the data because the database knows that the data has met our sorting requirements.

In fact, using indexes to optimize SQL statements with sorting requirements is a very important optimization method.

For more information, see the Implementation Analysis of MySQL order by, the basic implementation principles of group by in MySQL, and the basic implementation principles of MySQL DISTINCT.

5. In the execution plan, filesort will sort the disk files.

This misunderstanding does not blame us, but it is because MySQL developers have problems in terms of words. Filesort is the information displayed in the "Extra" column when we use the explain command to view an SQL Execution Plan.

In fact, as long as an SQL statement needs to be sorted, "Using filesort" will be displayed, which does not mean that there will be File Sorting operations.

Read more: Understand filesort in MySQL Explain command output. Here I will provide a more detailed introduction.

Basic Principles

1. Try to join less

The advantage of MySQL lies in simplicity, but it is also a disadvantage in some aspects. The MySQL optimizer is highly efficient. However, due to the limited amount of statistics, the Optimizer may be more prone to deviations during its operation. For complex multi-table Join, on the one hand, the optimizer is limited, and the effort to Join is not enough, so the performance is still a little far from that of Oracle and other relational databases. However, for simple single-table queries, this gap will be extremely small and even superior to those of database predecessors in some scenarios.

2. sort as few as possible

Sorting operations consume a large amount of CPU resources, so reducing sorting can greatly affect the SQL response time when the cache hit rate is higher than I/O capability.

For MySQL, there are multiple ways to reduce sorting, such:

The index-based sorting method mentioned in the above misunderstanding is used for optimization.

Reduce the number of records involved in sorting

Do not sort data unless necessary

...

3. Avoid select *

Many people may find this difficult to understand. Isn't the above misunderstanding saying that the number of fields in the select clause does not affect the data read?

Yes, most of the time it does not affect the IO volume, but when we still have the order by operation, the number of fields in the select clause will greatly affect our sorting efficiency, this can be described in detail in my previous article about MySQL order by implementation analysis.

In addition, the above misunderstanding does not mean that the IO volume is not affected in most cases. When our query results only need to be found in the index, it will greatly reduce the IO volume.

4. Try to use join instead of subquery

Although the Join performance is not good, it has a great performance advantage compared with MySQL subqueries. MySQL's subquery Execution Plan has been a big problem. Although this problem has existed for many years, it has been common in all the stable versions that have been released and has not been significantly improved. Although the official team admitted this issue for a long time and promised to solve it as soon as possible, we have not yet seen any better version to solve this problem.

5. Minimize or

When multiple conditions in the where clause coexist with "or", the MySQL optimizer does not solve the optimization problem of its execution plan very well, coupled with the unique SQL and Storage layered architecture of MySQL, the performance is relatively low. In many cases, union all or union is used (when necessary) to replace "or.

6. Try to replace union all with union

The main difference between union and union all is that the former needs to set two or more results and then perform the unique filtering operation. This will involve sorting and a large number of CPU operations, increase resource consumption and delay. So when we can confirm that there is no repeated result set or we do not care about the repeated result set, we should try to use union all instead of union.

7. filter as early as possible

This optimization strategy is most common in the index Optimization Design (placing more filters on top ).

This principle can also be used in SQL writing to optimize some Join SQL statements. For example, when we query paging data in multiple tables, we 'd better filter data in one table and split the pages, then, we can use the result set of split pages to Join other tables. This can reduce unnecessary IO operations as much as possible, greatly saving the time consumed by IO operations.

8. Avoid type conversion

The "type conversion" mentioned here refers to the type conversion that occurs when the column field type in the where clause is inconsistent with the input parameter type:

Manually convert column_name using the Conversion Function

MySQL (in fact, other databases may have the same issue) cannot use indexes. If conversion is required, the input parameters should be converted.

Switch by the database itself

If the input data type and field type are inconsistent, and no type conversion is performed, MySQL may convert the data type by itself, or it may not be processed by the storage engine. As a result, the index cannot be used, resulting in execution plan problems.

9. Optimize high-concurrency SQL statements first, instead of low execution frequency.

For destructive purposes, high-concurrency SQL statements are always larger than low-frequency ones, because if a high-concurrency SQL statement encounters a problem, it will not even give us any chance to breathe out, and the system will be overwhelmed. For some SQL statements that consume a large amount of IO and slow response, even if the response frequency is low, the system may slow down at most, but it may take at least a while, this gives us a buffer opportunity.

10. Optimization from the global perspective, rather than one-sided Adjustment

SQL optimization cannot be performed on a specific SQL statement, but all the SQL statements in the system should be fully considered. Especially when you adjust the index to optimize the SQL Execution Plan, you must never ignore this issue because of small or large issues.

11. Try to explain every SQL statement running in the database

To optimize SQL, you must be aware of the SQL Execution Plan to determine whether there is room for optimization and whether there is any execution plan problem. After optimizing the SQL statements running in the database for a period of time, it is obvious that there may be very few SQL statements, and most of them need to be explored, at this time, a large number of explain operations are required to collect execution plans and determine whether optimization is required.

Via IT168 Technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.