"Open source Framework those things 27" leisurely strum: A SQL-induced performance crisis and its hidden design behind the lack of

Last Update:2015-08-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There is a classmate, said that the system has a performance problem, said to let me help diagnose. Originally did not want to spend this time, the result can not withstand each other's stalker dozen, as long as promised to help see. The background of the story is that when the file is uploaded, there is time for someone to upload the file, but in the end it does not use the uploaded file, which will produce some junk files.
The original software author would like to write a background timer task program to clear these junk files?
Because the author is determined not to let me send her SQL statement (this I also understand, so ugly sql), so here does not originate code, hair Pseudo-code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

void deletemisslinkfile{
List filelist=getfilelist ();
List deletefilelist=new ArrayList ();
for (file:filelist) {
int Count1=execute (SELECT COUNT (*) from ...);
int Count2=execute (SELECT COUNT (*) from ...);
int Count3=execute (SELECT COUNT (*) from ...);
int Count4=execute (SELECT COUNT (*) from ...);
int Count5=execute (SELECT COUNT (*) from ...);
if (count1==0&&count2==0&&count3==0&&count4==0&&count5==0) {
Deletefilelist.add (file);
}
}
Delete (deletefilelist);
}

Of course, I have done some processing here, making it look a lot more beautiful, in fact, um, it is ugly. What are the performance conditions at this time? There are more than 500 data in the table, but the execution time is more than 100 seconds, but actually the actual application scenario is far more than this order of magnitude, and as the data increases, performance will decrease exponentially.
I said you go to add 100,000 records to test, make sure you don't come out overnight.
Okay, cut the crap, and then look at how to optimize the program.
Before we start, we can assume that there are n files, there are m file reference tables, and assume that all the file references are the same as the number of record bars in a table.
It is clear that the original implementation method was executed: 1 times the number of files query +n*m statistics operations
the stupidest method of optimizationFirst, use the lowest cost method to optimize a piece:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

void deletemisslinkfile{
List filelist=getfilelist ();
List deletefilelist=new ArrayList ();
for (file:filelist) {
int Count1=execute (SELECT COUNT (*) from ...);
if (count1>0) continue;
int Count2=execute (SELECT COUNT (*) from ...);
if (count2>0) continue;
int Count3=execute (SELECT COUNT (*) from ...);
if (count3>0) continue;
int Count4=execute (SELECT COUNT (*) from ...);
if (count4>0) continue;
int Count5=execute (SELECT COUNT (*) from ...);
if (count1>0) continue;
Deletefilelist.add (file);
}
Delete (deletefilelist);
}

Well, with the refactoring above, the performance can be boosted by a few times. Ugly is a little ugly, but 1 time times is not a small increase oh. Reason, the original is to calculate all the statistical values, and then to judge, through the above reconstruction, average as long as the half can be checked out, so the performance will be 1 time times the promotion.
1 times File count query +N*M/2 statistics operation
general method of optimizationI reminded her that you can change the inside and outside, performance will improve a lot, the result is not understand,.
In fact, the logic is this, because the statistical operation is very low execution efficiency, and the query with the primary key is very fast, that is, the logic from: Traverse all the files to see how many citations, change to remove all the referenced files from all file lists, the rest is to delete the junk file.

1
2
3
4
5
6
7
8
9
10
11
12
13

void deletemisslinkfile{
List filelist=getfilelist ();
List reflist1=execute (select File from Tb1 ...)
for (Ref:reflist1) {
Filelist.remove (ref)
}
List reflist2=execute (select File from TB2 ...)
for (REF:REFLIST2) {
Filelist.remove (ref)
}
......
Delete (deletefilelist);
}

Through the above optimization, the SQL statements that need to be executed are: 1+m SQL statement, the other is a large amount of memory data pair, relatively speaking, performance will be much higher, through a certain skill to do some optimization, there will be a greater increase.
This way, I Mao Yan than the original way, can be increased by two orders of magnitude above.
Why is it that two or so orders of magnitude have been raised or more stupid?
Because this method has a significant improvement over the original method, there are still serious design problems.
First of all, when the amount of data is small (this is compared to the data in the Internet application), do a full traversal is not a problem, but when the amount of data is relatively large, with a SQL to traverse all the data, there is a very big problem. This time to introduce a series of complex problems to solve, such as: the single-machine computing into cluster computing, the entire calculation into a segmented time, no matter what, is a very complex process.
the method of inactionThe quickest, easiest and most efficient method is to be introduced below.
In fact, as long as the algorithm is the optimization of space and room, so generally speaking I rarely say full. This time I used the "most" word, that is used to indicate that the future has no optimization of space, then what kind of algorithm can not optimize the space? The answer is: do nothing.
Of course, in fact, it is impossible to do nothing, the problem is where, you do not do how can be good?
is actually the task of a certain decomposition. Through the rational analysis and design of the architecture, all the files uploaded and deleted into public methods (or services), in the need to deal with the document, when dealing with the document, do the following:

File Upload: Add a piece of data to the file upload data, such as: file-related information, unique identification, the number of citations is 0
File association: When data is associated with a file, the number of changes to the reference is +1
File unlink: When the data is associated with the file (usually when the deletion or edit is empty or replaced by another one), the number of references is 1

Since then, when it is time to clean up the rubbish, it is very simple, as long as:
Select ... where ref_times=0
And then do the corresponding cleanup work just fine.
At this time, the processing mode is optimized, and the maintenance of file reference data is decomposed into the process of business work, which can greatly improve the processing efficiency of garbage disposal. Of course some people say: if this will make my business process slow, then what? In fact, it doesn't matter, you can turn this into an asynchronous message, notify the file reference processing to do this thing, so it won't affect your business processing efficiency.
SummaryThrough the above analysis, we have optimized the garbage cleaning process during the file uploading process, analyzed the original problem, and the following 3 kinds of optimization methods and their advantages and disadvantages.
Of course, many friends will have a better solution, welcome to participate in the discussion, and criticize correct.
If, you like my blog, pleaseconcernMe, in order to receive my latest news.
If you are interested in my open source framework, you canhereGet to the latest code, or you can accessTiny official websiteGet more messages, or go toTiny CommunityFor instant communication.

"Open source Framework those things 27" leisurely strum: A SQL-induced performance crisis and its hidden design behind the lack of

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Open source Framework those things 27" leisurely strum: A SQL-induced performance crisis and its hidden design behind the lack of

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Open source Framework those things 27" leisurely strum: A SQL-induced performance crisis and its hidden design behind the lack of

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support