A 20-second SQL slow query optimization experience and processing scheme

Source: Internet
Author: User

Background

A few days ago in the project on-line process, found that a page does not get the data correctly, after troubleshooting the interface call time-out, and the last discovery is because the SQL query length of more than 20 seconds to cause the problem occurs.

Here, there is no advanced theory or technology, just remember to go through and read some of the ideological misunderstanding.


Composition of complex SQL statements

There is not much to describe the business function, but in order to highlight the problem, the analogy statement is used to describe the scene at that time. A complex SQL statement can be expressed as follows:

SELECT * from A_table as a left JOIN b_table as B on A.id=b.id WHERE a.id in (SELECT DISTINCT ID from a_table wher E user_id in (100,102,103) GROUP by USER_ID have count (ID) > 3)
Correlation Query

From the simplified SQL statement above, you can see that the first thing to do is to correlate the query.

Sub-query

Second, it is a nested subquery. This subquery is to find out the group IDs that are common to multiple users. So the "100,102,103" in the statement is based on the scene and needs to correspond to the number of "count (ID) > 3" later. Simply put, it is the group ID of the user intersection.


Where's the time?

Let's assume that the A_table table has a data volume of 20W, while the b_table data is 2000W. Can you think of the main time-consuming part of the query, or the sub-query section?


(think space ....) )


(think space ....) )


(think space ....) )


Problem Locator

For the principle of SQL underlying and advanced theory, I temporarily mastered not deep enough. But I know that it is possible to use analogies and simple tests to verify which part of the problem is wrong.


Preliminary determination

First, for a single user ID, I'll simplify the above statement to:

SELECT * from A_table as a left JOIN b_table as B on A.id=b.id WHERE user_id in (100)


Therefore, the initial assertion should be that the subset of nested subqueries takes up most of the time.


Further verification

Now that you have the problem of nested subquery statements, it is divided into two areas to be cleared: is the subquery itself time consuming, or is it nesting and causing slow queries?


It is easy to see that when I execute the query alone in DB, it is very fast. So ruled out.

The rest of it goes without saying that a slow query of20 seconds is caused by nesting .


But because of the urgency of the line, to make sure, I quickly verified my conclusion:

1, the ID of the subquery is executed separately, and the resulting sequence is manually spelled into an ID, such as: 1,2,3,4, ..., 999

2, manually replace the above-obtained sequence ID with the original SQL statement

3, execution, Discovery, soon! it only takes about a few Ms


Well done! Ready to repair on-line!


Solution Solutions

Online problems, a lot of time is in the location of the problem and analysis of the reasons, since the problem has been found, the reasons are found, the solution is self-evident. The code can be easily processed.


Another point to be aware of

Currently, the actual SQL statement is more complex than this, but it is enough to express the problem. But in the early days, the author also made some SQL code.

Because b_table is larger than a_table, so the first b_table left associated a_table, very slow, about 1 seconds more , and the amount of data is very small, but if in turn, a_table left associated b_table, then quickly, It's about 100 milliseconds .

So, another interesting phenomenon is found:

Large table Left association small table, very slow, small table left associative right table, soon.


Of course, these we all know theoretically, but the actual development will be forgotten. Or when both tables are empty at first, and without taking into account the speed at which the two tables grow later, the pits will be buried in the future.


Summarize

First, nested subqueries are slow.

Reasons, I have not studied carefully, but on the way to work with my colleagues, he said that once read the relevant books, is that every time the subquery will produce an SQL statement, so it is N times query. Another senior QA colleague told me that it should be a m*n problem.


Second, when I started using nested subqueries, there was a misconception that I thought it would be more efficient to give these operations to MySQL itself, after all, there would be a good mechanism within DB to execute these queries.

Then, the actual confession, I was wrong. Because this is not a simple merge MC batch query.


When we decide to use some underlying technology, we can use it more appropriately only when we understand it thoroughly. And because ignorance determines the tools, the framework, the bottom of the omnipotent, often will be in the strokes.

A 20-second SQL slow query optimization experience and processing scheme

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.