子查詢最佳化
一條好的值得稱讚的規則是盡量用串連代替所有的子查詢。最佳化器有時可以自動將子查詢“扁平化”,並且用常規或外串連代替。但那樣也不總是有效。明確的串連對選擇表的順序和找到最可能的計劃給出了更多的選項。當你最佳化一個特殊查詢時,瞭解一下是否去掉自查詢可產生很大的差異。
樣本
下面查詢選擇了pubs資料庫中所有表的名字,以及每個表的叢集索引(如果存在)。如果沒有叢集索引,表名仍然顯示在列表中,在叢集索引列中顯示為虛線。兩個查詢返回同樣的結果集,但第一個使用了一個子查詢,而第二個使用一個外串連時。比較Microsoft SQL Server產生的查詢計劃
SUBQUERY SOLUTION
----------------------
SELECT st.stor_name AS 'Store',
(SELECT SUM(bs.qty)
FROM big_sales AS bs
WHERE bs.stor_id = st.stor_id), 0)
AS 'Books Sold'
FROM stores AS st
WHERE st.stor_id IN
(SELECT DISTINCT stor_id
FROM big_sales)
JOIN SOLUTION
----------------------
SELECT st.stor_name AS 'Store',
SUM(bs.qty) AS 'Books Sold'
FROM stores AS st
JOIN big_sales AS bs
ON bs.stor_id = st.stor_id
WHERE st.stor_id IN
(SELECT DISTINCT stor_id
FROM big_sales)
GROUP BY st.stor_name
SUBQUERY SOLUTION
----------------------
SQL Server parse and compile time:
CPU time = 28 ms
elapsed time = 28 ms
SQL Server Execution Times:
CPU time = 145 ms
elapsed time = 145 ms
Table 'big_sales'. Scan count 14, logical reads
1884, physical reads 0, read-ahead reads 0.
Table 'stores'. Scan count 12, logical reads 24,
physical reads 0, read-ahead reads 0.
JOIN SOLUTION
----------------------
SQL Server parse and compile time:
CPU time = 50 ms
elapsed time = 54 ms
SQL Server Execution Times:
CPU time = 109 ms
elapsed time = 109 ms
Table 'big_sales'. Scan count 14, logical reads
966, physical reads 0, read-ahead reads 0.
Table 'stores'. Scan count 12, logical reads 24,
physical reads 0, read-ahead reads 0.
不必更深探索,我們可以看到在CPU和總的實耗時間方面串連更快,僅需要子查詢方案邏輯讀的一半。此外,這兩種情況伴隨著相同的結果集,雖然排序的順序不同,這是因為串連查詢(由於它的GROUP BY子句)有一個隱含的ORDER BY:
Store Books Sold
-------------------------------------------------
Barnum's 154125
Bookbeat 518080
Doc-U-Mat: Quality Laundry and Books 581130
Eric the Read Books 76931
Fricative Bookshop 259060
News & Brews 161090
(6 row(s) affected)
Store Books Sold
-------------------------------------------------
Eric the Read Books 76931
Barnum's 154125
News & Brews 161090
Doc-U-Mat: Quality Laundry and Books 581130
Fricative Bookshop 259060
Bookbeat 518080
(6 row(s) affected)
查看這個子查詢方法展示的查詢計劃:
|--Compute Scalar(DEFINE:([Expr1006]=isnull([Expr1004], 0)))
|--Nested Loops(Left Outer Join, OUTER REFERENCES:([st].[stor_id]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([big_sales].[stor_id]))
| |--Stream Aggregate(GROUP BY:([big_sales].[stor_id]))
| | |--Clustered Index Scan(OBJECT:([pubs].[dbo].[big_sales].
[UPKCL_big_sales]), ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([pubs].[dbo].[stores].[UPK_storeid]
AS [st]),
SEEK:([st].[stor_id]=[big_sales].[stor_id]) ORDERED FORWARD)
|--Stream Aggregate(DEFINE:([Expr1004]=SUM([bs].[qty])))
|--Clustered Index Seek(OBJECT:([pubs].[dbo].[big_sales].
[UPKCL_big_sales] AS [bs]),
SEEK:([bs].[stor_id]=[st].[stor_id]) ORDERED FORWARD)
反之,求和查詢操作我們可以得到:
|--Stream Aggregate(GROUP BY:([st].[stor_name])
DEFINE:([Expr1004]=SUM([partialagg1005])))
|--Sort(ORDER BY:([st].[stor_name] ASC))
|--Nested Loops(Left Semi Join, OUTER REFERENCES:([st].[stor_id]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([bs].[stor_id]))
| |--Stream Aggregate(GROUP BY:([bs].[stor_id])
DEFINE:([partialagg1005]=SUM([bs].[qty])))
| | |--Clustered Index Scan(OBJECT:([pubs].[dbo].[big_sales].
[UPKCL_big_sales] AS [bs]), ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([pubs].[dbo].[stores].
[UPK_storeid] AS [st]),
SEEK:([st].[stor_id]=[bs].[stor_id]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([pubs].[dbo].[big_sales].
[UPKCL_big_sales]),
SEEK:([big_sales].[stor_id]=[st].[stor_id]) ORDERED FORWARD)
使用串連是更有效方案。它不需要額外的流彙總(stream aggregate),即子查詢所需在big_sales.qty列的求和。
UNION vs UNION ALL
無論何時儘可能用UNION ALL 代替UNION。其中的差異是因為UNION有排除重複行並且對結果進行排序的副作用,而UNION ALL不會做這些工作。選擇無重複行的結果需要建立臨時工作表,用它排序所有行並且在輸出之前排序。(在一個select distinct 查詢中顯示查詢計劃將發現存在一個流彙總,消耗百分之三十多的資源處理查詢)。當你確切知道你得需要時,可以使用UNION。但如果你估計在結果集中沒有重複的行,就使用UNION ALL吧。它只是從一個表或一個串連中選擇,然後從另一個表中選擇,附加在第一條結果集的底部。UNION ALL不需要工作表和排序(除非其它條件引起的)。在大部分情況下UNION ALL更具效率。一個有潛在危險的問題是使用UNION會在資料庫中產生巨大的泛濫的臨時工作表。如果你期望從UNION查詢中獲得大量的結果集時,這就可能發生。
樣本
下面的查詢是選擇pubs資料庫中的表sales的所有商店的ID,也選擇表big_sales中的所有商店的ID,這個表中我們加入了70,000多行資料。在這兩個方案間不同之處僅僅是UNION 與UNION ALL的使用比較。但在這個計劃中加入ALL關鍵字產生了三大不同。第一個方案中,在返回結果集給用戶端之前需要流彙總並且排序結果。第二個查詢更有效率,特別是對大表。在這個例子中兩個查詢返回同樣的結果集,雖然順序不同。在我們的測試中有兩個暫存資料表。你的結果可能會稍有差異。
UNION SOLUTION
-----------------------
UNION ALL SOLUTION
-----------------------
SELECT stor_id FROM big_sales
UNION
SELECT stor_id FROM sales
----------------------------
SELECT stor_id FROM big_sales
UNION ALL
SELECT stor_id FROM sales
----------------------------
|--Merge Join(Union)
|--Stream Aggregate(GROUP BY:
([big_sales].[stor_id]))
| |--Clustered Index Scan
(OBJECT:([pubs].[dbo].
[big_sales].
[UPKCL_big_sales]),
ORDERED FORWARD)
|--Stream Aggregate(GROUP BY:
([sales].[stor_id]))
|--Clustered Index Scan
(OBJECT:([pubs].[dbo].
[sales].[UPKCL_sales]),
ORDERED FORWARD)
|--Concatenation
|--Index Scan
(OBJECT:([pubs].[dbo].
[big_sales].[ndx_sales_ttlID]))
|--Index Scan
(OBJECT:([pubs].[dbo].
[sales].[titleidind]))
UNION SOLUTION
-----------------------
Table 'sales'. Scan count 1, logical
reads 2, physical reads 0,
read-ahead reads 0.
Table 'big_sales'. Scan count 1,
logical
reads 463, physical reads 0,
read-ahead reads 0.
UNION ALL SOLUTION
-----------------------
Table 'sales'. Scan count 1, logical
reads 1, physical reads 0,
read-ahead reads 0.
Table 'big_sales'. Scan count 1,
logical
reads 224, physical reads 0,
read-ahead reads 0.
雖然在這個例子的結果集是可互換的,你可以看到UNION ALL語句比UNION語句少消耗一半的資源。所以應當預料你的結果集並且確定已經沒有重複時,使用UNION ALL子句。
函數和運算式約束索引
當你在索引列上使用內建的函數或運算式時,最佳化器不能使用這些列的索引。盡量重寫這些條件,在運算式中不要包含索引列。
樣本
你應該協助SQL Server移除任何在索引數值列周圍的運算式。下面的查詢是從表jobs通過唯一的叢集索引的唯一索引值選擇出的一行。如果你在這個列上使用運算式,這個索引就不起作用了。但一旦你將條件’job_id-2=0’ 該成‘job_id=2’,最佳化器將在叢集索引上執行seek操作。
QUERY WITH SUPPRESSED INDEX
-----------------------
OPTIMIZED QUERY USING INDEX
-----------------------
SELECT *
FROM jobs
WHERE (job_id-2) = 0
SELECT *
FROM jobs
WHERE job_id = 2
|--Clustered Index Scan(OBJECT:
([pubs].[dbo].[jobs].
[PK__jobs__117F9D94]),
WHERE:(Convert([jobs].[job_id])-
2=0))
|--Clustered Index Seek(OBJECT:
([pubs].[dbo].[jobs].
[PK__jobs__117F9D94]),
SEEK:([jobs].[job_id]=Convert([@1]))
ORDERED FORWARD)
Note that a SEEK is much better than a SCAN,
as in the previous query.
下面表中列出了多種不同類型查詢樣本,其被禁止使用列索引,同時給出改寫的方法,以獲得更優的效能。
QUERY WITH SUPPRESSED INDEX
---------------------------------------
OPTIMIZED QUERY USING INDEX
--------------------------------------
DECLARE @job_id VARCHAR(5)
SELECT @job_id = ‘2’
SELECT *
FROM jobs
WHERE CONVERT( VARCHAR(5),
job_id ) = @job_id
-------------------------------
DECLARE @job_id VARCHAR(5)
SELECT @job_id = ‘2’
SELECT *
FROM jobs
WHERE job_id = CONVERT(
SMALLINT, @job_id )
-------------------------------
SELECT *
FROM authors
WHERE au_fname + ' ' + au_lname
= 'Johnson White'
-------------------------------
SELECT *
FROM authors
WHERE au_fname = 'Johnson'
AND au_lname = 'White'
-------------------------------
SELECT *
FROM authors
WHERE SUBSTRING( au_lname, 1, 2 ) = 'Wh'
-------------------------------
SELECT *
FROM authors
WHERE au_lname LIKE 'Wh%'
-------------------------------
CREATE INDEX employee_hire_date
ON employee ( hire_date )
GO
-- Get all employees hired
-- in the 1st quarter of 1990:
SELECT *
FROM employee
WHERE DATEPART( year, hire_date ) = 1990
AND DATEPART( quarter, hire_date ) = 1
-------------------------------
CREATE INDEX employee_hire_date
ON employee ( hire_date )
GO
-- Get all employees hired
-- in the 1st quarter of 1990:
SELECT *
FROM employee
WHERE hire_date >= ‘1/1/1990’
AND hire_date < ‘4/1/1990’
-------------------------------
-- Suppose that hire_date may
-- contain time other than 12AM
-- Who was hired on 2/21/1990?
SELECT *
FROM employee
WHERE CONVERT( CHAR(10),
hire_date, 101 ) = ‘2/21/1990’
-- Suppose that hire_date may
-- contain time other than 12AM
-- Who was hired on 2/21/1990?
SELECT *
FROM employee
WHERE hire_date >= ‘2/21/1990’
AND hire_date < ‘2/22/1990’
SET NOCOUNT ON
使用SET NOCOUNT ON 提高T-SQL代碼速度的現象使SQL Server開發人員和資料庫系統管理者驚訝難解。你可能已經注意到成功的查詢返回了關於受影響的行數的系統資訊。在很多情況下,你不需要這些資訊。這個SET NOCOUNT ON命令允許你禁止所有在你的會話事務中的子查詢的資訊,直到你發出SET NOCOUNT OFF。
這個選項不只在於其輸出的裝飾效果。它減少了從伺服器端到用戶端傳遞的資訊量。因此,它協助降低了網路通訊量並提高了你的事務整體回應時間。傳遞單個資訊的時間可以忽略,但考慮到這種情況,一個指令碼在一個迴圈裡執行一些查詢並且發送好幾KB無用的資訊給使用者。
為做個例子,一個檔案含T-SQL批處理,其在big_sales表插入了9999行。
-- Assumes the existence of a table called BIG_SALES, a copy of pubs..sales
SET NOCOUNT ON
DECLARE @separator VARCHAR(25),
@message VARCHAR(25),
@counter INT,
@ord_nbr VARCHAR(20),
@order_date DATETIME,
@store_nbr INT,
@qty_sold INT,
@terms VARCHAR(12),
@title CHAR(6),
@starttime DATETIME
SET @STARTTIME = GETDATE()
SELECT @counter = 0,
@separator = REPLICATE( '-', 25 )
WHILE @counter < 9999
BEGIN
SET @counter = @counter + 1
SET @ord_nbr = 'Y' + CAST(@counter AS VARCHAR(5))
SET @order_date = DATEADD(hour, (@counter * 8), 'Jan 01 1999')
SET @store_nbr =
CASE WHEN @counter < 999 THEN '6380'
WHEN @counter BETWEEN 1000 AND 2999 THEN '7066'
WHEN @counter BETWEEN 3000 AND 3999 THEN '7067'
WHEN @counter BETWEEN 4000 AND 6999 THEN '7131'
WHEN @counter BETWEEN 7000 AND 7999 THEN '7896'
WHEN @counter BETWEEN 8000 AND 9999 THEN '8042'
ELSE '6380'
END
SET @qty_sold =
CASE WHEN @counter BETWEEN 0 AND 2999 THEN 11
WHEN @counter BETWEEN 3000 AND 5999 THEN 23
ELSE 37
END
SET @terms =
CASE WHEN @counter BETWEEN 0 AND 2999 THEN 'Net 30'
WHEN @counter BETWEEN 3000 AND 5999 THEN 'Net 60'
ELSE 'On Invoice'
END
-- SET @title = (SELECT title_id FROM big_sales WHERE qty = (SELECT MAX(qty)
FROM big_sales))
SET @title =
CASE WHEN @counter < 999 THEN 'MC2222'
WHEN @counter BETWEEN 1000 AND 1999 THEN 'MC2222'
WHEN @counter BETWEEN 2000 AND 3999 THEN 'MC3026'
WHEN @counter BETWEEN 4000 AND 5999 THEN 'PS2106'
WHEN @counter BETWEEN 6000 AND 6999 THEN 'PS7777'
WHEN @counter BETWEEN 7000 AND 7999 THEN 'TC3218'
ELSE 'PS1372'
END
-- PRINT @separator
-- SELECT @message = STR( @counter, 10 ) -- + STR( SQRT( CONVERT( FLOAT,
@counter ) ), 10, 4 )
-- PRINT @message
BEGIN TRAN
INSERT INTO [pubs].[dbo].[big_sales]([stor_id], [ord_num], [ord_date],
[qty], [payterms], [title_id])
VALUES(@store_nbr, CAST(@ord_nbr AS CHAR(5)), @order_date, @qty_sold,
@terms, @title)
COMMIT TRAN
END
SET @message = CAST(DATEDIFF(ms, @starttime, GETDATE()) AS VARCHAR(20))
PRINT @message
/*
TRUNCATE table big_sales
INSERT INTO big_sales
SELECT * FROM sales
SELECT title_id, sum(qty)
FROM big_sales
group by title_id
order by sum(qty)
SELECT * FROM big_sales
*/
當帶SET NOCOUNT OFF命令運行,實耗時間是5176毫秒。當帶SET NOCOUNT ON命令運行,實耗時間是1620毫秒。如果不需要輸出中的行數資訊,考慮在每一個預存程序和指令碼開始時增加SET NOCOUNT ON 命令將。
TOP 和 SET ROWCOUNT
SELECT 語句中的TOP子句限制單個查詢返回的行數,而SET ROWCOUNT限制所有後續查詢影響的行數。在很多編程任務中這些命令提供了高效率。
SET ROWCOUNT在SELECT,INSERT,UPDATE OR DELETE語句中設定可以被影響的最大行數。這些設定在命令執行時馬上生效並且隻影響當前的會話。為了移除這個限制執行SET ROWCOUNT 0。
一些實際的任務用TOP or SET ROWCOUNT比用標準的SQL命令對編程是更有效率的。讓我們在幾個例子中證明:
TOP n
在幾乎所有的資料庫中最流行的一個查詢是請求一個列表中的前N項。在 pubs資料庫案例中,我們可以尋找銷售最好CD的前五項。比較用TOP,SET ROWCOUNT和使用ANSI SQL的三種方案。
純 ANSI SQL:
Select title,ytd_sales
From titles a
Where (select count(*)
From titles b
Where b.ytd_sales>a.ytd_sales
)<5
Order by ytd_sales DESC
這個純ANSI SQL方案執行一個效率可能很低的關聯子查詢,特別的在這個例子中,在ytd_sales上沒有索引支援。另外,這個純的標準SQL命令沒有過濾掉在ytd_sales的空值,也沒有區別多個CD間有關聯的情況。
使用 SET ROWCOUNT:
SET ROWCOUNT 5
SELECT title, ytd_sales
FROM titles
ORDER BY ytd_sales DESC
SET ROWCOUNT 0
使用 TOP n:
SELECT TOP 5 title, ytd_sales
FROM titles
ORDER BY ytd_sales DESC
第二個方案使用SET ROWCOUNT來停止SELECT查詢,而第三個方案是當它找到前五行時用TOP n來停止。在這種情況下,在獲得結果之前我們也要有一個ORDER BY子句強制對整個表進行排序。兩個查詢的查詢計劃實際上是一樣的。然而,TOP優於SET ROWCOUNT的關鍵點是SET必須處理ORDER BY子句所需的工作表,而TOP 不用。
在一個大表上,我們可以在ytd_sales上建立一個索引以避免排序。查詢將使用該索引找到前5行並停止。與第一個方案相比較,其掃描了整個表,並對每一行執行了一個關聯子查詢。在小表上,效能的差異是很小的。但是在一個大表上,第一個方案的處理時間可能是數個小時,而後兩個方法是數秒。
當確定查詢需要時,請考慮是否只需要其中幾行,如果是,使用TOP子句將節約大量時間。