Complex SQL Split optimization
Splitting SQL is one of the most effective methods of performance optimization, which is to break down complex SQL into simple SQL by a certain logic, and finally execute an equivalent logic with the help of temporary tables, and have achieved efficient execution. I've been trying to write a blog post optimized by splitting SQL. Recently happened to meet a real case, more representative, is now shared, we have to analyze through a case, why split statements can improve the efficiency of SQL execution, more importantly, to figure out why the split before the slow, split why fast? Childish words, you crossing mock not a
First look at the data volume of the related table, the large table also has more than 59 million, the small table has more than 1.6 million (declaration: I never think that 5000W is a big table, or the table is very big to show that the business or database is very bull, never have. Can divide the big table to divide the small table, never appears the super Big table and can satisfy the business demand, that is called the great.
The following is the optimization of the SQL statement in fact, SQL is not complex, but the data of these tables are slightly larger, but also nested inside more complex business logic, history after several rounds of "master" in the index of the whole side of the optimization, can also run normally but over time, the amount of data in the table is getting larger , boiled frog in warm water, SQL is getting slower, slower, and finally exposed, performance problems can not be covered up, to tell the truth such a SQL, paging query run more than 1 minutes (the server is more stable, no load, before testing bloggers habitual rebuild all indexes)
The causes of the above problems are diverse, business, historical, data and so on, do not have to despise it, haha every family has this difficult to read, in fact, do not laugh, before a colleague left to a very good public company, and once again said lying trough here the system is not as good as we in the * * Company's system, haha Bloggers are in the company, there are thousands of SQL Server database servers, and still more than one or two minutes a minute of the query or some, this is the owner of the long-term optimization of SQL is the reason for it, because this SQL encountered too much, the past developers and DBAs are not terrible, It's impossible to make a qualitative change by indexing. Not that I don't care about indexing, or that I don't know the index, I think it's just an index-optimized SQL statement, or an index that's up to dozens of times times faster, which can only explain one problem: the problem itself is too weak.
When the first time you see this SQL execution is so slow, after understanding the related table data, the first feeling can be split, reduce the SQL join condition, query the complexity of the condition, and then with the other table join to produce the final result set,
But how do you split it? Which table should I split first? How to combine? That's the nature of the problem. For example, the following query statement, there are four table joins, there are multiple query conditions, join conditions and so on
SelectA.colname,b.colname,c.colname,d.colnameFromTableA A INNER JOIN TableB B on A.id=b.id and a.type=b.type and other conditions inner join TableC C on B.code=c.code and other conditions INNER join tabled D on C.businessid=d.businessid and other conditions where a.businessdate>=date1 and A.businessdate<=date2 and A.businessstatus= and B.businessdate= " and other query conditions and other query conditions order by Col1,col2,col3offset m ROWS FETCH NEXT n ROWS only
If it is a three-table split, with the fourth table join, you can use the following alternative to join the ABC to add the corresponding query conditions, split into a temporary table, and then join with D, you can join the ABD together with the corresponding query conditions, split into a temporary table, and then join with C table , you can join the ACD together with the corresponding query conditions, split into a temporary table, and then join with the B table, you can add the BCD join together with the corresponding query conditions, split into a temporary table, and then join a table, here is a little trick, To observe the three tables plus the total number of rows for the result set of the query criteria, such as the ABC join is a 3,000 result set, the 3,000 result set and the D table join produce 20w results, then you can first exclude the D table, let ABC join together with the corresponding query conditions, generate temporary table, Establish a reasonable index on the temporary table, and then join the D table, that is, to exclude some of the joins that produce a large result set, other table joins, get a relatively small set of temporary results, index on the temporary result set, and then use the temporary result set to join other tables.
This split, and the most important step, in the temporary table to add the appropriate index to optimize the temporary table and the execution of the physical table if the amount of data is small, split is counterproductive, completely unnecessary, but in the larger the amount of data, the more obvious the effect, then the split after this is how obvious? Remember how many seconds ago? 1分钟3秒, that's 63 seconds, here's 2 seconds. To be honest, this kind of splitting method often uses, to tell the truth this speed hint is I did not think of
Actually, it's just the beginning.
Why is it so slow before splitting and why did it become so fast after splitting? The implementation of the plan is not to look at, said above, the query is not missing the index, but also do use the index, but does not mean that with the index, the index, it is all right. Because the query condition is more complex, the related table establishes the compound index, if to say the index, must say the statistic information (statistics), for the compound index, also namely two above the field index, its statistic characteristic is maintains only the first field histogram information, this determines the SQL Sever it is possible to make an error when estimating the amount of data I have here to write statistics related knowledge, you can refer to some of the conditions of the query, even if the composite index is used, SQL Server does not accurately estimate the number of rows of data under certain conditions, if SQL Server starts It is wrong to predict the expected amount of data is very small, then each successive step can not accurately predict the size of the real data, that is, the first step is wrong, resulting in each step behind the first step of interference, often followed by loop join execution, this way for a smaller result set, of course, no problem, If you encounter a large result set, it is very inefficient (see a lot of super-complex, join tables, very complex query conditions, and the slow-running Sql,sql server is often in the way of loop join processing between tables)
So we first split up a small result set, stored in a temporary table, in the first step of the split process, even if in some cases it is not possible to correctly estimate the number of rows of the table, because the result set is relatively small, the use of loop join way to deal with is no problem. Once we split up a temporary table, add a reasonable index to the temporary table, and then join with the other big table, because SQL becomes simple, and indexed, often in an efficient way to execute, performance is up so what is efficient? Is it subjective or speculative, for example? For example, through a larger memory grant, because you can more accurately estimate the number of rows per step, the use of parallel running (there are write parallel related, can refer to), and so on, to obtain more resources to improve the efficiency of execution in fact, the example of the SQL after the split operation, formally because of this, Grants greater memory + parallelism for efficient execution.
Summarize:
In this paper, the optimization method is achieved by splitting a SQL statement, which shows that, under certain circumstances, splitting SQL is an optional scheme for optimization. Of course, it is not that the complex SQL will perform slowly, must be split, for multiple large table joins, if the logic is simple, may also be executed quickly but for those multiple large table join SQL, especially in connection conditions, query conditions, index information complex situation, if there is a performance problem, Consider splitting the SQL to optimize its execution efficiency this can only be said, the execution of slow SQL, through the specific analysis, can be resolved by splitting the SQL statement to generate a temporary table. SQL split solves performance problems, but, more importantly, be sure to understand: slow, why slow, fast, why fast, unclear, similar problems will occasionally make you feel confused. Understand the essence, to be able to easily, better grasp SQL Server.
SQL Server SQL Performance Optimization--by splitting SQL to improve execution efficiency and the reasons behind performance