Why not using a subquery in SQL Server is not recommended

Source: Internet
Author: User

In SQL Server, subqueries can be divided into correlated subqueries and unrelated subqueries, but not-in clauses are common for unrelated subqueries, but the not-in clause leads to the following two types of problems:

    • Inaccurate results
    • Poor query performance

Let's take a look at why we try not to use the NOT in clause.

The problem of inaccurate results

In SQL Server, a null value is not a value, but rather a specific meaning, the meaning of which is "Unknow", which can be understood as undefined or unknown, so that any two-dollar operator result that is compared to a null value must be NULL, including the null value itself. In SQL Server, however, the meaning of the null value is converted to type bool with the result of false. Let's look at a simple example, shown in 1.

Figure 1. Null value compared to any value results are null

SQL Server provides a comparison of the "is" operator with a null value to measure whether a value is null.

So what's the problem with not in, 2?

Figure 2. Not in produces inaccurate values

In Figure 2, condition 3 does not belong to any of the list in the back of not, and the query does not return any values, unlike the expected result, the reason is that the non in clause is handling null values in SQL Server, the not as shown in Figure 2 The IN clause can actually be converted to a query as shown in 3.

Figure 3: Equivalent conversions are possible for the NOT in clause

In Figure 3, you can see that not in can be converted to a condition for each value of the unequal pair, and with logic and connection, and the previous mentioned null value compared with any other value, the result is always null, in the Where condition is false, so 3<> Null causes no rows to be returned, resulting in a not-in clause that produces unexpected results.

Therefore, the NOT in clause if it comes from a table or a list is long, and even if a null value exists in a large number of values, it can cause the final result to return no data.

Solution?

The workaround is to use NOT exists as an alternative instead of not. The EXISTS operator does not return NULL, and only returns TRUE or false based on each row in the subquery, and when a null value is encountered, only false is returned, not a null value that causes the entire subquery expression to be null. For the query shown in Figure 2, we can rewrite it as a subquery, as shown in 4.

Figure 4. Not exists can return results correctly

Low query performance due to not in

As we can see earlier, the main problem with not in is because of the handling of null values, so why is the processing of null values causing performance problems? Let's take a look at the example in Figure 5. In Figure 5, we used the Adventurework sample database and, for demonstration purposes, changed the definition of the ProductID of the SalesOrderDetail table from NOT NULL to NULL, at which point we made a simple not-in query. As shown in 5.

Figure 5. Execution plan for not in

In Figure 5, we see a row Count spool operator, which is used to confirm that there are null values in the ProductID column (the procedure is to compare total rows and non-null rows, not to wait for null values, although we know that there are no null values in the column. However, because the column definition allows null, SQL Server must make additional acknowledgments, which takes up nearly half of the query cost. So we compare the not exists,6.

Figure 6. Not in Vs not Exists

As can be seen from Figure 6, the execution cost of not in is almost 3 times times that of not exists, simply because SQL Server needs to confirm that NULL is present in the Allow null column. According to the equivalent form in Figure 3, not in, we can completely convert the not into the equivalent not exist form, 7 is shown.

Figure 7. Not in conversion to not Exists

Let's compare the cost of Figure 7 and its equivalent not in query, as shown in 8.

Figure 8: Completely equivalent in cost

So we can see that not in requires extra steps to process the null value, which is only defined in the ProductID column in the SalesOrderDetail table to allow NULL, If we also define SalesOrderHeader's SalesOrderID column to allow NULL, we find that SQL Server also requires an additional cost to confirm that there are null values on the column. As shown in 9.

Figure 9. SQL Server solves the problem of nullable columns by joining the left Anti Semi join operator

The not-in corresponding equivalent not-exist form is now shown in code Listing 1.

SELECT  *
 from    Sales.SalesOrderHeader A
WHERE    not EXISTS SELECT *
                      from   Sales.SalesOrderDetail b
                     WHERE  A.salesorderid = B.productid)
         and  not EXISTS SELECT   *
                            from     Sales.SalesOrderDetail b
                           WHERE     is NULL
                         ) )
         and  not EXISTS SELECT 1
                          from   SELECT    *
                                   from      Sales.SalesOrderHeader
                                 as C
                         WHERE   is NULL )

Code Listing 1. When NULL is allowed for a JOIN column two column definition, the not-in equivalent of not EXISTS form

At this point we are simply comparing the IO cases of not and not exists, as shown in 10.

Figure 10. Not in eats very high IO

Summary

This paper expounds the implementation principle of not in and the data inconsistency and performance problems, so as to avoid using not in while writing query, and converting to the not exists equivalent form provided by this paper will reduce a lot of trouble.

Why not using a subquery in SQL Server is not recommended

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.