原文出處: http://blog.csdn.net/dba_huangzj/article/details/31374037 轉載請引用
之前在論壇中見到一個針對in/exists的討論,原帖懶得找了,這裡介紹一下最近的學習小結:
NOT IN和NOT EIXTS在對允許為null的列查詢時會有一定的風險。特別是NOT IN,如果子查詢包含了最少一個NULL,會出現非預期的結果。下面做一個示範。
IF OBJECT_ID('ShipmentItems', 'U') IS NOT NULL DROP TABLE dbo.ShipmentItems; GO CREATE TABLE dbo.ShipmentItems ( ShipmentBarcode VARCHAR(30) NOT NULL , Description VARCHAR(100) NULL , Barcode VARCHAR(30) NOT NULL ); GO INSERT INTO dbo.ShipmentItems ( ShipmentBarcode , Barcode , Description ) SELECT '123456' , '1010203' , 'Some cool widget' UNION ALL SELECT '123654' , '1010203' , 'Some cool widget' UNION ALL SELECT '123654' , '1010204' , 'Some cool stuff for some gadget'; GO -- retrieve all the items from shipment 123654 -- that are not shipped in shipment 123456 SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123654' AND Barcode NOT IN ( SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123456' ); /* Barcode ------------------------------ 1010204*/
可以看出得到了期待結果。下面看看修改表結構,允許列為null的情況:
ALTER TABLE dbo.ShipmentItems ALTER COLUMN Barcode VARCHAR(30) NULL; INSERT INTO dbo.ShipmentItems ( ShipmentBarcode , Barcode , Description ) SELECT '123456' , NULL , 'Users manual for some gadget'; GO SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123654' AND Barcode NOT IN ( SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123456' ); /* Barcode ------------------------------*/
很多人會覺得這是一個bug,有時候能查出資料,有時候卻不能。但是實際上不是bug,當NOT IN子句返回最少一個NULL時,查詢會返回空,下面的語句能更好地說明這個想法:
SELECT CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True' ELSE 'Unknown or False' END , CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True' ELSE 'Unknown or False' END; /* ---- ---------------- True Unknown or False */
實際上,由於IN的本質是OR操作,所以:
SELECT CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True' ELSE 'Unknown or False' END ;
中,1 in 1,也就是為TRUE,所以返回true,這個語句的邏輯實際上是:
SELECT CASE WHEN ( 1 = 1 ) OR ( 1 = 2 ) OR ( 1 = NULL ) THEN 'True' ELSE 'Unknown or False' END ;
當使用NOT IN 時,如下面的語句:
SELECT CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True' ELSE 'Unknown or False' END ;
會轉變成:
SELECT CASE WHEN NOT ( ( 1 = 1 ) OR ( 1 = 2 ) OR ( 1 = NULL ) ) THEN 'True' ELSE 'Unknown or False' END ;
根據離散數學的概念,可以轉換為:
SELECT CASE WHEN ( ( 1 <> 1 ) AND ( 1 <> 2 ) AND ( 1 <> NULL ) ) THEN 'True' ELSE 'Unknown or False' END ;
謂詞有短路特性,即在AND條件中,只要有一個條件為false,整個條件都為false,而1<>1是為false,所以後面的也不需要判斷了,直接返回else部分。即使是1<>null,根據集合論的特性,NULL和實際資料的對比總是返回unknown,所以也是為false。如果你非要用NOT IN ,請確保子查詢永遠不會有NULL返回。或者需要額外處理去除NULL,比如:
SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123654' AND Barcode NOT IN ( SELECT Barcode FROM dbo.ShipmentItems WHERE ShipmentBarcode = '123456' AND Barcode IS NOT NULL ) ;
還有一種方法就是改寫語句,用NOT EXISTS來等價替換:
SELECT i.Barcode FROM dbo.ShipmentItems AS i WHERE i.ShipmentBarcode = '123654' AND NOT EXISTS ( SELECT * FROM dbo.ShipmentItems AS i1 WHERE i1.ShipmentBarcode = '123456' AND i1.Barcode = i.Barcode ); /* Barcode ------------------------------ 1010204*/
另外,基於SARG要求,一般不建議用NOT IN/NOT EXISTS這種反向掃描,避免影響效能。還有一個選擇使用IN/EXISTS的要點,就是多列匹配的問題,在T-SQL中,多列同時匹配要用EXISTS,而單列匹配可以用EXISTS/IN。可能可以用其他寫法來實現IN的多列匹配,但是一般我個人會選擇使用EXISTS來匹配多列。
原文出自:CSDN部落格:黃釗吉的部落格