In SQL Server, will SQL Where 1 = 1 and affect performance?

Source: Internet
Author: User
Tags xml xpath

A friend recently discussed with me whether statements in the Where 1 = 1 and format will affect the performance. The final conclusion is no effect.

Although the conclusion is correct, the understanding of the problem is far from the root of the problem. In fact, in the process of writing T-SQL statements often make a mistake is to draw a very narrow conclusion, and then teach the Bible, for the T-SQL field, on the Internet can often see the so-called optimization code, I searched for some excerpts on the Internet as follows:

Do not JOIN more than five tables)
Consider using temporary tables or table variables to store intermediate results
Use less subqueries
View Nesting should not be too deep. Generally, it is recommended that there be no more than two nested views.
Add an index to the fields in the where clause.
Avoid using functions or calculations on index columns. In the where clause, if the index is a part of the function, the optimizer will not use the index and use full table scanning.
Add a condition to insert and update a dimension table to filter existing records in the dimension table.
If in or is used, the specified index is explicitly stated.
EXISTS is far more efficient than IN.
..........

Where is the problem?

Although the above guidance does not seem to be a problem, it cannot be completely incorrect, but there are actually two major problems:

Out-of-context: Many reasons can only take effect within a context range, and it is meaningless if it is out of the context scope. For example, someone often tells you that you are a little kidney-deficient. I think your first reaction must be to defend the dignity of a man, but if you go to the hospital to check the doctor, then you may be eager to explain how to add :-). Let's take an example of the extracted statement: 1) use less subqueries, if SQL Server filters XML XPATH by node attributes, it will be faster to convert it into a subquery. 2) If IN OR is used, the query is not indexed, explicitly declare the specified index. In this case, the query analyzer does not go through the index,

Not to explain the essential reason: The Buddhist saying has the cloud: "All vertices are imaginary. If we see all vertices, we can see them as we see them ". See the following story:

It is said that two government officials once came to see a doctor together, one called Xiao Xun and the other called Li Yan, both of which have the same symptoms. They both have headaches and fever. Maybe they all have a cold. Hua Han said: "Xiao Xun should use the following method for governance, and Li Yan should use the perspiration method for governance (seeking the present, delaying the sweating )." Others think it is strange. You must also think it is strange. Why are there different treatments for the same disease and symptoms? Hua Yu explained, he said: "The tracing is an external reality, and the vertical delay is an internal reality, so different methods are used ." Sure enough, they both got well the next day.

In fact, it can be seen that the same symptoms can be completely different, and vice versa, the same reason can also form a completely different "phase ". If we take stress measures only to see the "phase", the results will often be unsatisfactory.

Think Like Query Optimizer

There are rules in each field. In the simplest way, if you do not conform to the C # specification for programming, such as incorrect keyword usage, an error will be reported during compilation. Of course, there are some hidden rules in each field, and some people may say that they are "hidden rules". These rules are often not clear. For example, if you do not conform to the best practices to write a program, compilation will not report an error, but the performance or security problems caused by this are that you need to follow the "hidden rules" of best practices to avoid this.

In the field of SQL Server, the return of the T-SQL statement to the query result requires a complete cycle, as shown in Figure 1:



Figure 1. T-SQL lifecycle

Therefore, in the field of relational databases, the writing of SQL statements is just an abstract logic, rather than implementing them as directly as programming languages. For example, to access a row of data, if it is implemented by a programming language, you need to specify the method for connecting data, open the data, retrieve the data in a certain way, and finally close the connection. In SQL Server, t-SQL only defines how to get the data you need without considering implementation details.

As shown in Figure 1, from the T-SQL to the specific returned data goes through multiple steps, each step has a large number of rules. Therefore, the performance problems caused by Where 1 = 1 and mentioned in this article need to be considered according to the query analyzer rules, which is also Think like query optimizer.

In SQL Server, the T-SQL needs to be compiled into an execution plan for execution. During the compilation process, Query Optimizer needs to consider a lot of metadata, for example, the index, data distribution, estimated number of rows, some parameter configurations, and hardware environment of a table, among which the most important thing is to estimate the number of rows. SQL Server needs to estimate the number of rows to estimate the cost.

Why is the Where 1 = 1 and syntax not slow?

Because the query analyzer filters out 1 = 1 directly in the algebra tree optimization stage. This function is called "Constant Folding" in the query optimizer ". Let's assume that the query analyzer does not directly filter out where 1 = 1 in the algebra tree optimization phase.
For example, if the select * from table where a = 1 and B = 2 statement is used, the number of rows estimated by SQL Server is: column a's selection rate * Column B's selection rate * Total number of sampled rows in the table. Therefore, when Where 1 = 1 and a = 1, the result is changed to 1 * column a's selection rate * Total number of sampled rows in the table = selection rate of column a * Total number of sampled rows in the table

Therefore, no matter whether there is 1 = 1 and, the query analyzer will estimate the same number of rows, so it has the same execution plan, so it does not affect performance.

When we understand how the query analyzer estimates the number of rows for the expression A and B, we can calculate the situations where A and B may cause inaccurate execution plans. According to the formula, SQL Server considers column A and column B to be unrelated. If column A and column B are highly correlated, the estimated number of rows will be very inaccurate.

For example, if the table contains 1 million rows of data, where a = 1 has 10 thousand rows, and where B = 1 has 10 thousand rows, then the selectivity of A And B is 1/100 = 0.01. In the Where clause, the estimated number of rows associated with A And B is 0.01*0.01 = 0.0001*1 million = 100 rows, if the data filtered by where a = 1 and B = 1 is the same 10 thousand rows, the estimated number of rows is 100, and the actual number of rows is 10 thousand, which may lead to inaccurate execution plans, this causes performance problems. Of course, this situation is indeed a minority, but it often has a certain impact on the performance. Therefore, SQL Server 2014 uses an exponential concession algorithm to estimate the number of new rows, in this case, it will be estimated to be 1000 rows, which leads to a lower possibility of performance problems. The 2014 index bounce algorithm is not the focus of this article, so we will not talk about it much.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.