Some summary of SQL optimization

Source: Internet
Author: User

Http://www.taobaodba.com/html/851_sql%E4%BC%98%E5%8C%96%E7%9A%84%E4%B8%80%E4%BA%9B%E6%80%BB%E7%BB%93.html SQL optimization is an indispensable part of the DBA's daily work, remember in the student period, once saw a post on the itpub, when the landlord in the introduction of SQL optimization, with a formula to explain his SQL optimization in the time to follow the principle:

T=S/V (t stands for time, S represents distance, V stands for Speed)

s refers to the total amount of resources that SQL needs to access, V refers to the amount of resources that SQL unit time can access, and T is naturally the time it takes for SQL to execute; In order to obtain the fastest execution time for SQL, we can reverse the definition of the formula:

    1. In the case of S invariant, we can raise V to reduce T: With proper index adjustment, we can convert a large number of slow random io into a faster sequential Io, by raising the memory of the server so that more data can be put into memory, which will be significantly faster than the data on disk. The SSD, which uses electronic storage media for data storage and reading, breaks through the performance bottleneck of traditional mechanical hard disk, and makes it have high storage performance; we can use the higher-configured hardware to improve the speed in the lifting v.
    2. In the case of V invariant, we can reduce the s to lower T: This is a very central part of SQL optimization, in the reduction s link, the DBA can do a lot of, usually can be in the query conditions to establish an appropriate index to avoid full table scan; sometimes you can rewrite SQL, add some appropriate prompt, To change the SQL execution plan so that SQL completes the query with a minimal scan path, and when all of these methods are used, do you have any other options to optimize the Nan? One of the aspects of the DBA job description in the Ali department is that the DBA needs to understand the business in depth, and when the DBA has a deep understanding of the business, it can stand on the business, and at the DB point of view, this time to optimize, sometimes to achieve a multiplier effect.

Case one: By lowering S, to raise the T

Principle Introduction:
We know that the value of the B + index leaf node is indexed by the index word orderby, for example, we index (Nick,appkey) two fields, then in the index is in the ascending order of Nick,appkey; If we are now a sql:
Select COUNT (Distinct Nick) from xxxx_nickapp_09_29;
Used to query the statistics of a day in the log table of Uvs, the optimizer selected the table on the index Ind_nick_appkey (nick,appkey) to complete the query, then began to scan down from Nick1 until the last Nick_n, Then the middle process scans a lot of duplicate nick (the leftmost normal scan), and if we can skip the middle repeat of Nick, then the performance will be optimized very much (the right-most loose scan):

From the above you can get a conclusion:

If this statistic UV SQL can be scanned by loose index scan on the right, it will greatly reduce the s mentioned above, so we need to rewrite SQL to achieve pseudo Loose index scan: (MySQL optimizer cannot directly count ( Distinct column)

[Email protected] 09:41:30>select count (*) from (SELECT DISTINCT (Nick) from xxxx_nickapp_09_29) T;
+ ———-+
| COUNT (*) |
+ ———-+
| 806934 |
+ ———-+
SQL inside the query to choose a different nick, and finally set a layer of count, you can get the sum of Nick's distinct value;
The most important is in the subquery: SELECT DISTINCT (Nick) implements the pseudo-loose index scan in, the optimizer at this time the execution plan for the using index for group-by, In this way, MySQL optimizes distinct to group by, first using the index to group, then scans the index, and scans only one record for the required Nick.

Real Case:

This case is selected from one of our online production system, the system has a large number of log data daily storage, the capacity of a single table between 10g-50g, and then do a summary analysis, the calculation of the data in the UV is one of the logic, SQL is as follows:

Select COUNT (Distinct Nick) from xxxx_nickapp_09_29;

Even with Nick's index on the _xxxx table, by looking at the execution plan for full index scanning, because of the large amount of data in a single table, when SQL executes, it will bring jitter to the entire server, and the original SQL needs to be rewritten to support loose index scan;

Before optimization:

[Email protected] 09:41:30>select count (distinct Nick) from xxxx_nickapp_09_29;
+ ———-+
| COUNT (*) |
+ ———-+
| 806934 |

1 row in Set (52.78 sec)
It takes 52.78s to execute SQL once

After optimization:

[Email protected] 09:41:30>select count (*) from (SELECT DISTINCT (Nick) from xxxx_nickapp_09_29) T;

+ ———-+
| COUNT (*) |
+ ———-+
| 806934 |
+ ———-+
1 row in Set (5.81 sec)

From 52.78 seconds to 5.81 seconds, the speed increased by almost 10 times times;

To view the execution plan for SQL:

Optimized notation:

[Email protected] 09:41:30>explain SELECT COUNT (*) from (SELECT DISTINCT (Nick) from xxxx_nickapp_09_29) T;

+--+ ————-+ —————————— + ——-+ ————— + ——————————— + ——— +-–
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+--+ ————-+ —————————— + ——-+ ————— + ——————————— + ——— +-–
| 1 | Simple | xxxx_nickapp_09_29 | Range | NULL |ind_nick_appkey | 67 | NULL | 2124695 | Using Index for Group-by |
+--+ ————-+ —————————— + ——-+ ————— + ——————————— + ——— +-–
Original wording:
[Email protected] 09:41:50>explain Select COUNT (distinct Nick) from xxxx_nickapp_09_29;
+--+ ————-+ —————————— + ——-+ ————— + —————————-+ ——— +--+–
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+--+ ————-+ —————————— + ——-+ ————— + —————————-+ ——— +--+–
| 1 | Simple | xxxx_nickapp_09_29 | Index | NULL | Ind_nick_appkey | 177 | NULL | 19546123 | Using Index |
+--+ ————-+ —————————— + ——-+ ———— –+ —————————-+ ——— +--+–

We can see that our journey has been reduced from 19546123 to 2124695 and reduced by more than 9 times times. ^_^

Case two: Combining the writing characteristics of business increment, cleverly optimize the UV statistics COUNT (*)

Sometimes it is thought that the best way to optimize a SQL is to allow this SQL to remove this from the system, regardless, these are built on you enough to understand the business, you can drive some business products upgrade or offline, such a DBA can you do?

Here's a case: every day the app will count a total of the inbound table statistics: SELECT COUNT (*) from xx_01_01;
As the data volume of the single table is getting bigger and larger (single table is around 20G), each time count, the speed is more and more slow, at the same time need to scan more data page block, cause the whole database performance jitter, by analyzing the characteristics of the business, because each table using the self-increment ID of the way to insert, and no data deleted, So the total number of statistics can be used to modify the whole table:

So this article Sql:select count (*) from xx_01_01;
You can work with: SELECT Max (ID)-min (ID) +1 from xx_01_01;
The speed of execution can get a qualitative leap ^_^.

Case three: Reduce T-random io VS sequential io by lifting V

As we mentioned earlier, some of the ways to promote V can usually be achieved by upgrading the server hardware, but many small and medium enterprises are now far behind the cost for them, and without the mature experience of using them, it may be a bad thing for them. In general, your server hardware regardless of the cow, if the SQL write bad, the index is not good, it is still not.

Real online case: In one of our core product libraries, which carries a very large amount of random reads, call it a library. One day read library load is very high, through slow log discovery, there is a SQL frequently in the slow day, this SQL query condition is very complex, and the table similar to the same index is also very much, when it was suspected index go wrong, by explain To see the execution plan for SQL: Find the using where in the execution plan to represent the query back to the table, and because the table records rows are large, it brings a lot of random IO:

So we just need to is_detail the original index redundant field can be overridden by the method of the index to optimize the SQL, avoid the query back to the table caused by the random Io, with sequential io replacement of the original random Io,sql execution speed has been greatly improved: (Will be removed is_ Test of the detail field)

Summary: SQL optimization is a very interesting thing, we can in daily work in accordance with t=s/v ideas to optimize, perhaps you first use it when some strange, but as long as the practice, good at summing up, you will find the law, it is wonderful. It is also important that your SQL optimization does not break away from the actual business, perhaps you in the optimization of a SQL spent 1 hours, but to discuss with the development of students to optimize the results of the time, the development of students said that this SQL can actually go offline, then really laugh and cry ^_^.

Some summary of SQL optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.