Original: Remember the importance of a T-SQL query to optimize indexes
Overview
When tuning the performance of a project component, it is really important to find the SQL design, so write a blog record to summarize.
Environment Introduction
This project component is a window service that internally uses the round-robin opportunity to process an event in an event table and convert it to the corresponding task. The performance problem is that, statistically, this service can only handle 12 or so of the time within a second. This performance is very poor.
The SQL version I used is SQL 2012, the machine is CPU i7-2670, memory 16G,SSD hard disk.
In this database there is a table data volume of about 300,000 data, not many, in advance no index, there is only one primary key index.
So here's a very simple query statement:
SELECT TOP 1 * from sms_shortno_assign WHERE APP_CODE = ' Smsnotice ' and is_dynamic_assign = ' N ' and sms_type_code = ' Mas '
Performance comparisons with data and no data
In the above query, is_dynamic_assign = ' N ' is not a query for any data, is_dynamic_assign = ' Y ' is data, compared, in the absence of any data, the query is very slow, but there is data in the case, it is different.
Let's start by looking at what this SQL query plan looks like:
The following is a clearer implementation of the query plan:
As you can see, a table scan is performed without an index.
Take a look at the respective execution times:
You can query the data:
|
Cannot query to data: |
As you can see, it takes a total of 89ms of time without querying the data. Do not think 89ms is only 0.1s, but think of the above said 1S executes processing 12 records, you can imagine and this 89ms has a considerable relationship, if only to execute this SQL, then 1S clock can only execute 12 or so.
In this case, let's refine this SQL statement. First of all, the SQL itself is the simplest, and can no longer be simplified, then only work on the index.
Clustered indexes and nonclustered indexes
What is the difference between the two? You can refer to a blog post for another blogger's blog. Clustered and nonclustered indexes (grooming).
First of all, we do not have in-depth research on the index of children's shoes, that is, where the field behind the conditions added to build an index.
To create a nonclustered index from a Where condition field
After creating well, let's look at the query plan for the above statement:
Why do you use a table scan instead of using an index?
Post a blog post here Select * must not go the index is correct? This blog post analyzes the relationship between select * and various indexes, but this blog post is not the same as what I came up with, I also wrote a comment in the author's comments, and I found the truth of a blog SELECT *: Index coverage to explain my current behavior. Because I do not study SQL, so I do not know what is the reason, hope to know, can tell. For an index overlay you can also refer to this blog post for SQL Server Query performance optimization-overwrite index (ii).
Now that I've changed select * to a SELECT field, the index is actually applied.
You can see that if a field in select is contained in the index, it will be available to the index.
But in that case, it is unacceptable to change the purpose of my original procedure. Is there any other way to solve it? This time I thought of the clustered index.
Create a clustered index
By default, a clustered index of a primary key is created by default when you use Table Designer to create a table. Creating a clustered index from a primary key is not necessarily the best choice. For clustered indexes, refer to the index optimization (2) clustered index. I looked at my table structure, and I built a clustered index on the two most frequently used columns, which are contained in the where statement of the above statement. These two fields are not primary keys.
Once created, let's look at the time of the query plan and query:
Query time:
You can see that the query speed has been 0ms, very fast. In fact, the problem with this SQL optimization should be over.
The clustered index is important and a table can only build a clustered index, not based on the where of a certain SQL, but to consider the various where conditions to determine whether such a clustered index is optimal, I based on these two fields to build a good clustered index, I use the other where to query, The speed is also very fast, so the final confirmation using these two words Jianjian clustered index.
Of course, my project is still a lot of statements can be optimized, and the program C # code itself can be optimized, after my optimization, processing speed can be up to 1 seconds to process 130 or so.
Outside the title
=========================== External =======================
In the study of this optimization process, there are some other ideas and questions, also recorded here.
The problem is solved by creating a clustered index above, and nonclustered indexes are established, but nonclustered indexes are not used, so is it useless to say nonclustered indexes? Not so, nonclustered indexes are a big part of SQL optimization.
Previously, the non-focused index was used in cases where the select contained only indexed columns. So here's an example to illustrate the purpose of a nonclustered index.
We delete the nonclustered indexes of the three previously established fields and use statistical functions to count the number of eligible bars:
Query time:
You can see the time-consuming or long-28ms. You do not have to pay attention to count (*) can use COUNT (1) or count (primary key), this discussion online is also a lot, I myself switch three kinds of writing also have no essential difference.
At this point, we add the previously deleted nonclustered index back, and then look at the query plan and time:
You can see the query plan, which takes precedence over nonclustered indexes, and the speed of the statistics is faster than using a clustered index.
Questions (ask questions)
In a SQL, is also very simple SQL, using the LEFT join, will cause the query performance is not high, in this case, how to optimize it, I use not Exists, sub-query to replace the various can not reduce the SQL query time.
The business scenario is this, SQL or the same as before, the sms_shortno_locked table will be stored in the Sms_shortno_assign table inside the record, lock the time will be added one, the unlock when the record will be deleted, so use left Join to remove a record that is not locked.
Here is its query statement and query plan and response time:
This 26ms is mostly in the shortno_locked is null judgment, if not using is null, but use shortno_locked = 1 or =0 This method to judge, the query is very fast.
So here, I would like to ask you, I believe that many people use the left JOIN, and then use is NULL to determine the table does not have data. But this performance is not very high, there is no way to solve the problem of the left join, or can be changed to other ways, I tried a lot of things have not improved.
So I think in the future when designing the table, is not as far as possible to use INNER JOIN, and then according to a certain field to determine the specific value, so that the field can use the index to optimize, like above because is null problem is not able to use the index.
I hope to have expert guidance, thank you.
Remember the importance of a T-SQL query to optimize indexes