Source: http://blog.csdn.net/dba_huangzj/article/details/31374037 reproduced Please reference
I have seen a discussion on in/exists in the forum. I am too lazy to find the original post. Here I will introduce the recent study summary:
Not in and not eixts may have certain risks when querying columns that are allowed to be null. IN particular, not in. If a subquery contains at least one NULL value, unexpected results will appear. The following is a demonstration.
IF OBJECT_ID('ShipmentItems', 'U') IS NOT NULL DROP TABLE dbo.ShipmentItems; GO CREATE TABLE dbo.ShipmentItems ( ShipmentBarcode VARCHAR(30) NOT NULL , Description VARCHAR(100) NULL , Barcode VARCHAR(30) NOT NULL ); GO INSERT INTO dbo.ShipmentItems ( ShipmentBarcode , Barcode , Description ) SELECT '123456' , '1010203' , 'Some cool widget' UNION ALL SELECT '123654' , '1010203' , 'Some cool widget' UNION ALL SELECT '123654' , '1010204' , 'Some cool stuff for some gadget'; GO -- retrieve all the items from shipment 123654 -- that are not shipped in shipment 123456 SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123654' AND Barcode NOT IN ( SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123456' ); /* Barcode ------------------------------ 1010204*/
We can see the expected results. Let's take a look at modifying the table structure and allow the column to be null:
ALTER TABLE dbo.ShipmentItems ALTER COLUMN Barcode VARCHAR(30) NULL; INSERT INTO dbo.ShipmentItems ( ShipmentBarcode , Barcode , Description ) SELECT '123456' , NULL , 'Users manual for some gadget'; GO SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123654' AND Barcode NOT IN ( SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123456' ); /* Barcode ------------------------------*/
Many people think this is a bug. Sometimes data can be found, but sometimes data cannot be found. But it is actually NOT a bug. When the not in Clause returns at least one NULL value, the query returns NULL. The following statement can better illustrate this idea:
SELECT CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True' ELSE 'Unknown or False' END , CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True' ELSE 'Unknown or False' END; /* ---- ---------------- True Unknown or False */
IN fact, because the essence of IN is the OR operation, so:
SELECT CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True' ELSE 'Unknown or False' END ;
In, 1 in 1, that is, TRUE, so true is returned. The logic of this statement is actually:
SELECT CASE WHEN ( 1 = 1 ) OR ( 1 = 2 ) OR ( 1 = NULL ) THEN 'True' ELSE 'Unknown or False' END ;
When not in is used, the following statement is used:
SELECT CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True' ELSE 'Unknown or False' END ;
Will be converted:
SELECT CASE WHEN NOT ( ( 1 = 1 ) OR ( 1 = 2 ) OR ( 1 = NULL ) ) THEN 'True' ELSE 'Unknown or False' END ;
According to the concept of discrete mathematics, it can be converted:
SELECT CASE WHEN ( ( 1 <> 1 ) AND ( 1 <> 2 ) AND ( 1 <> NULL ) ) THEN 'True' ELSE 'Unknown or False' END ;
The predicate has the short-circuit feature, that is, in the AND condition, if one condition is false, the entire condition is false, AND 1 <> 1 is false, therefore, the latter part does not need to be determined, and the else part is returned directly. Even if it is 1 <> null, according to the characteristics of the set theory, the comparison between NULL and actual data always returns unknown, so it is also false. If you have to use not in, make sure that the subquery will never return NULL. Or you need to remove NULL with additional processing, for example:
SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123654' AND Barcode NOT IN ( SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123456' AND Barcode IS NOT NULL ) ;
Another method is to rewrite the statement and replace it with not exists:
SELECT i.Barcode FROM dbo.ShipmentItems AS i WHERE i.ShipmentBarcode = '123654' AND NOT EXISTS ( SELECT * FROM dbo.ShipmentItems AS i1 WHERE i1.ShipmentBarcode = '123456' AND i1.Barcode = i.Barcode ); /* Barcode ------------------------------ 1010204*/
IN addition, based on the SARG requirements, it is generally NOT recommended to use reverse scanning such as not in/not exists to avoid affecting performance. There is also a choice to use IN/EXISTS key point, is the problem of multi-column match, IN the T-SQL, multi-column match to use EXISTS at the same time, while single column match can use EXISTS/IN. You may use other methods to implement IN Multi-column matching, but I usually choose EXISTS to match multiple columns.
Original article from: CSDN blog: Huang zhaoji's blog