SQL Server Performance Tuning 3 Index Maintenance
Preface
The previous article introduced how to improve database query performance by creating indexes, which is just the beginning. If you do not need proper maintenance in the future, the indexes you have previously created may even become a drag-and-drop attack and a helper for database performance degradation.
Search for fragments
Removing Fragments may be the most common task of index maintenance. Microsoft officially recommends that you use REORGANIZE to "REORGANIZE" indexes when the fragmentation level is between 5% and 30%, if it reaches 30% or more, use REBUILD to "recreate" the index. Many factors may need to be considered to determine the method and operation time. The following four factors must be taken into account:
- Backup plan
- Server Load
- Remaining disk space
- Recovery Model
PS: Although fragments are closely related to performance, they can be ignored in certain situations. For example, if you have a table with clustered indexes, almost all the processing for this table is to retrieve a data record based on the primary key. In this case, the effect of fragments is negligible.
So how can we determine the fragmentation of an index? Use the system function sys. dm_db_index_physical_stats and the system directory sys. Indexes. The sample script is as follows:
--Gets information about all indexes on the specified table (example: orddemo)
SELECT
sysin.name as IndexName
,sysIn.index_id
,func.avg_fragmentation_in_percent
,func.index_type_desc as IndexType
,func.page_count
FROM
sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'ordDemo'), NULL, NULL, NULL) AS func
JOIN
sys.indexes AS sysIn
ON
func.object_id = sysIn.object_id AND func.index_id = sysIn.index_id
--Index "ID of clustered index is 1
--The nonclustered index is index "ID > 1
--The following script is filtered with the where clause (tables without indexes are excluded)
--This script returns all indexes of the database, which may take a long time!
SELECT
sysin.name as IndexName
,sysIn.index_id
,func.avg_fragmentation_in_percent
,func.index_type_desc as IndexType
,func.page_count
FROM
sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS func
JOIN
sys.indexes AS sysIn
ON
func.object_id = sysIn.object_id AND func.index_id = sysIn.index_id
WHERE sysIn.index_id>0;
Output:
The fragmentation of the sample database is 0, because the fragmentation is generated during addition, deletion, and modification, and our database has not performed similar operations.
Fill Factor
As mentioned above, data is stored in the database on 8 KB data pages. Suppose you have a table with a clustered index. when data is inserted, the database finds the Insert Location (data page) based on the primary key and writes the information. If the data page is full or there is not enough space to store new data, the database will create a new 8 KB data page, and this new process will cause I/O consumption.
Fill factor is used to reduce this situation. If you set the fill factor to 10, your data initially only uses 10% of the 8 KB data page, when you insert a new record, you don't have to worry about excessive I/O consumption because 90% of the data page is reserved.
The fill factor is also a double-edged sword. It increases the write operation performance while reducing the read operation performance.
[Fill factor only works when you create an index or rebuildi index. It is invalid for general DML operations (data pages are always filled to 100% )]
The following script helps you understand the index fill factor value:
SELECT
OBJECT_NAME(OBJECT_ID) AS TableName
,Name as IndexName
Type_Desc
Fill_Factor
FROM
Sys.indexes
WHERE
--Here, only clustered index and nonclustered index are represented by where filter
type_desc<>'HEAP'
You can also view the default fill factor value on the Data Server:
SELECT
Description
,Value_in_use
FROM
sys.configurations
WHERE
Name ='fill factor (%)'
PS: 0 indicates that no reserved space is retained.
Use the following script to set the fill factor value:
ALTER INDEX [idx_refno] ON [ordDemo]
REBUILD WITH (FILLFACTOR= 80)
GO
--If you want to set the default values on the server, use the following script
Sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
sp_configure 'fill factor', 90
GO
RECONFIGURE
GO
We recommend that you use a large fill factor (more than 90%) for a static table (occasionally updated), and a lower fill factor (70%-80%) for tables with frequent reading and writing ). In particular, when your clustered index is built on an auto-increment field, it is no problem to set the fill factor to 100%, because the newly inserted data is always at the end of all the data, there will be no insert between record and record.
REBUILD index to Improve Index Efficiency
As the name suggests, the function of re-indexing includes the elimination of fragments, statistics Update, and the alignment of the physical sorting order on the data page. In addition, it will compress the data page based on the fill factor and (if necessary) Add a data page. This operation consumes a lot of resources and takes a long time. If you decide to start re-indexing, you also need to know that there are two working modes:
Offline mode: This is the default index reconstruction mode. It will lock the table until the reconstruction is completed. If the table is large, the user (for several hours) cannot use the table. Compared with the online mode, the offline mode works faster and consumes less space for TempDb.
Online mode: if the objective condition does not allow you to lock the table, you can only select the online mode, which consumes more time and server resources. It is worth mentioning that if your table contains varchar (max), nvarchar (max), and text fields, indexes cannot be re-created in this mode.
[Note: this mode is only supported by the developer/Enterprise Edition. The offline mode is used by default for other versions !]
The following is an example script for re-indexing:
--Index reconstruction in online mode
ALTER INDEX [idx_refno] ON [ordDemo]
REBUILD WITH (FILLFACTOR=80, ONLINE=ON)
GO
--Rebuild index idx? Refno in offline mode
ALTER INDEX [idx_refno] ON [ordDemo]
REBUILD WITH (FILLFACTOR=80, ONLINE=OFF)
GO
--Rebuild all indexes on the orderdemo table
ALTER INDEX ALL ON [ordDemo]
REBUILD WITH (FILLFACTOR=80, ONLINE=OFF)
GO
--Rebuild index idx? Reno (drop? Existing = on)
CREATE CLUSTERED INDEX [idx_refno] ON [ordDemo](refno)
WITH
(
DROP_EXISTING = ON,
FILLFACTOR = 70,
ONLINE = ON
)
GO
--Rebuild all indexes on the orddemo table using DBCC dbreindex
DBCC DBREINDEX ('ordDemo')
GO
--Using DBCC dbreindex to rebuild an index on the orddemo table
DBCC DBREINDEX ('ordDemo','idx_refno',90)
GO
[Dbcc dbreindex will be deprecated in later versions]
Based on the author's personal experience, it is better to use bulk-logged recovery (bulk-logged recovery) or simple recovery (simple recovery) to recreate a large data table, this prevents the log file from being too large. However, you need to be reminded that the backup chain of the database will be interrupted when the recovery mode is switched. If you were using the full recovery mode before, remember to switch back after reconstruction.
Be patient during reconstruction. It may take one day for a long time. It is very dangerous to interrupt him (the database may enter the recovery mode ).
You must be the owner of the table, sysadmin of the server, or db_owner/db_ddladmin of the database.
REORGANIZE indexes to Improve Index Efficiency
Reorganization does not lock any object. It is an optimization of the current B-Tree, and organizes data page processing and fragment. The example script for reindexing is as follows:
--Reorganize the "IDX [u refno" index on the "orddemo" table
ALTER INDEX [idx_refno] ON [ordDemo]
REORGANIZE
GO
--Reorganize all indexes on the orderdemo table
ALTER INDEX ALL ON [ordDemo]
REORGANIZE
GO
--Reorganize all indexes on orddemo table in AdventureWorks 2012 database
DBCC INDEXDEFRAG ('AdventureWorks2012','ordDemo')
GO
--Reorganize the index idx? Refno on the orddemo table in AdventureWorks 2012 database
DBCC INDEXDEFRAG ('AdventureWorks2012','ordDemo','idx_refno')
GO
Note: The user performing this operation must be the owner of the table, the sysadmin Member of the server, or the db_owner/db_ddladmin of the database.
Missing indexes found
Now you know the performance improvements brought about by indexes, but in practice it is difficult to set up correct and necessary indexes at the very beginning. How can we determine which tables need indexes, which indexes are created incorrectly?
In general, SQL Server uses existing indexes to execute the query script. If no index is found, it will automatically generate one and store it in the DMV (dynamic management view. This information is cleared whenever the SQL Server service is restarted. Therefore, it is best to keep the SQL Server service running during retrieval of missing indexes until all the business logic is completed.
For more information, see the following link:
- Sys. dm_db_missing_index_details
- Sys. dm_db_missing_index_group_stats
- Sys. dm_db_missing_index_groups
- Sys. dm_db_missing_index_columns (Index_Handle)
Provide a ready-made script:
SELECT
avg_total_user_cost * avg_user_impact * (user_seeks + user_scans)
AS PossibleImprovement
,last_user_seek
,last_user_scan
,statement AS Object
,'CREATE INDEX [IDX_' + CONVERT(VARCHAR,GS.Group_Handle) + '_' +
CONVERT(VARCHAR,D.Index_Handle) + '_'
+ REPLACE(REPLACE(REPLACE([statement],']',''),'[',''),'.','') +
']'
+' ON '
+ [statement]
+ ' (' + ISNULL (equality_columns,'')
+ CASE WHEN equality_columns IS NOT NULL AND inequality_columns IS
NOT NULL THEN ',' ELSE '' END
+ ISNULL (inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + included_columns + ')', '')
AS Create_Index_Syntax
FROM
sys.dm_db_missing_index_groups AS G
INNER JOIN
sys.dm_db_missing_index_group_stats AS GS
ON
GS.group_handle = G.index_group_handle
INNER JOIN
sys.dm_db_missing_index_details AS D
ON
G.index_handle = D.index_handle
Order By PossibleImprovement DESC
PS: the information you obtain is a list of proposals. The final decision lies in you. In addition, DMV stores up to 500 indexes.
We have created indexes to Improve the Performance of unused indexes. However, if the indexes are not used, they become cumbersome.
Keep the SQL Server service running for the same reason as the previous section until all business logic is completed. Run the script:
SELECT
ind.Index_id,
obj.Name as TableName,
ind.Name as IndexName,
ind.Type_Desc,
indUsage.user_seeks,
indUsage.user_scans,
indUsage.user_lookups,
indUsage.user_updates,
indUsage.last_user_seek,
indUsage.last_user_scan,
'drop index [' + ind.name + '] ON [' + obj.name + ']' as
DropIndexCommand
FROM
Sys.Indexes as ind
JOIN
Sys.Objects as obj
ON
ind.object_id=obj.Object_ID
LEFT JOIN
sys.dm_db_index_usage_stats indUsage
ON
ind.object_id = indUsage.object_id
AND
ind.Index_id=indUsage.Index_id
WHERE
ind.type_desc<>'HEAP' and obj.type<>'S'
AND
objectproperty(obj.object_id,'isusertable') = 1
AND
(isnull(indUsage.user_seeks,0) = 0
AND
isnull(indUsage.user_scans,0) = 0
AND
isnull(indUsage.user_lookups,0) = 0)
ORDER BY
obj.name,ind.Name
GO
It is up to you to decide what action to take after obtaining the information. Note the following when you decide to delete an index:
- If the current index is a primary key or a unique key, it can ensure data integrity.
- A unique index, even if it is not used, can provide information to the optimizer to help it generate a better execution plan.
Create an index view to improve performance
A view is a stored query, which is like a table. It has two main advantages:
- Restrict users to access only specific fields and data in a certain table
- Allows developers to customize the original information to form a user-oriented logical view.
When the index view is created, the query statements are parsed/optimized, and relevant information is physically stored in the database. Before deciding to use the index view, consider the following suggestions:
- View should not refer to other views
- Try to refer to any original table
- The field name must be explicitly defined with a proper alias
In addition, if the query for this object is rarely updated, or the original table is a frequently updated table, it is not appropriate to use the index view.
If you have a query that contains a large amount of aggregate/join operations and the table data volume is large, you can consider using the index view. To use the index view, you must set the following parameters (NUMERIC_ROUNDABORT is OFF, and others are ON)
- ARITHABORT
- CONCAT_NULL_YIELDS_NULL
- QUOTED_IDENTIFIER
- ANSI_WARNINGS
- ANSI_NULLS
- ANSI_PADDING
- NUMERIC_ROUNDABORT
Sample script:
CREATE VIEW POView
WITH SCHEMABINDING
AS
SELECT
POH.PurchaseOrderID
,POH.OrderDate
EMP.LoginID
,V.Name AS VendorName
,SUM(POD.OrderQty) AS OrderQty
,SUM(POD.OrderQty*POD.UnitPrice) AS Amount
,COUNT_BIG(*) AS Count
FROM
[Purchasing].[PurchaseOrderHeader] AS POH
JOIN
[Purchasing].[PurchaseOrderDetail] AS POD
ON
POH.PurchaseOrderID = POD.PurchaseOrderID
JOIN
[HumanResources].[Employee] AS EMP
ON
POH.EmployeeID=EMP.BusinessEntityID
JOIN
[Purchasing].[Vendor] AS V
ON
POH.VendorID=V.BusinessEntityID
GROUP BY
POH.PurchaseOrderID
,POH.OrderDate
EMP.LoginID
V.Name
GO
--Create a clustered index on the view to make it an indexed view
CREATE UNIQUE CLUSTERED INDEX IndexPOView ON POView
(PurchaseOrderID)
GO
You can compare the query statement with the execution plan of the query index view. The index view provides better query performance:
The SQL Server Query Optimizer always tries to find the best execution plan. Sometimes, although you have created an index view, the optimizer still uses the index of the original table, in this case, you can use with noexpand to force the index on the index view (instead of the index on the original table ).
The index view is supported in all versions of SQL Server 2012. In the development or Enterprise Edition, the query processor can even optimize the query that matches the index view.
When creating an index view, you must include with schemabinding to ensure that the fields used are not modified.
If the index view contains a group by clause, the SELECT clause must contain COUNT_BIG (*), and HAVING, CUBE, and ROLLUP cannot be specified.
Use indexes on calculated fields (Computed Columns) to improve performance
First, we will introduce Computed Columns, which uses an expression to reference other fields in the same table and then calculate a result. The value of this field will be re-calculated every time it is called, unless you have added the PERSISTED mark when creating the field.
Before deciding whether to create an index on a calculated field, consider the following:
- When the calculated field is Image, Text, or ntext, it can only be used as a non-key column of a non-clustered index)
- The calculated field expression cannot be of the REAL or FLOAT type.
- The calculated field should be accurate (?)
- The calculated field should be determined (the same value is input and the same result is output)
- If the calculated field uses a function, whether it is a user function or a system function, the owner of the table and function must be the same
- Functions (such as SUM and AVG) for multi-row records cannot be used in calculated fields.
- Adding, deleting, modifying, and changing the index value on the calculated field. Therefore, you must set the following six parameters.
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
Here is a complete example:
1. Set system variables and create our test data table
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
SELECT
[SalesOrderID]
,[SalesOrderDetailID]
,[CarrierTrackingNumber]
,[OrderQty]
,[ProductID]
,[SpecialOfferID]
,[UnitPrice]
INTO
SalesOrderDetailDemo
FROM
[AdventureWorks2012].[Sales].[SalesOrderDetail]
GO
2. Create a user-defined function, create a calculated field, and use this function.
CREATE FUNCTION
[dbo].[UDFTotalAmount] (@TotalPrice numeric(10,3), @Freight
TINYINT)
RETURNS Numeric(10,3)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @NetPrice Numeric(10,3)
SET @NetPrice = @TotalPrice + (@TotalPrice*@Freight/100)
RETURN @NetPrice
END
GO
--adding computed column SalesOrderDetailDemo table
ALTER TABLE SalesOrderDetailDemo
ADD [NetPrice] AS [dbo].[UDFTotalAmount] ( OrderQty*UnitPrice,5)
GO
3. Create a clustered index, turn on the performance indicator switch, and execute a query (note that we have not yet created an index on the calculated field !)
CREATE Clustered Index idx_SalesOrderID_SalesOrderDetailID_
SalesOrderDetailDemo
ON SalesOrderDetailDemo(SalesOrderID,SalesOrderDetailID)
GO
--checking SalesOrderDetailDemo with statistics option ON to
--measure performance
SET STATISTICS IO ON
SET STATISTICS TIME ON
GO
--checking SELECT statement without having Index on Computed
Column
SELECT * FROM SalesOrderDetailDemo WHERE NetPrice>5000
GO
The output performance is as follows:
SQL Server parse and compile time: CPU time = 650 ms, elapsed time = 650 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (3864 row (s) affected) Table 'salesorderdetaildemo'. Scan count 1, logical reads 757, Physical reads 0, read-ahead reads 0, lob logical reads 0, lob Physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 562 ms, elapsed time = 678 ms. |
4. Before creating an index on a calculated field, you can use the following script to determine whether the index meets the previously mentioned creation requirements: (Return Value: 0 does not meet, 1 Does)
SELECT
COLUMNPROPERTY( OBJECT_ID('SalesOrderDetailDemo'),'NetPrice','IsIndexable') AS 'Indexable?'
,COLUMNPROPERTY( OBJECT_ID('SalesOrderDetailDemo'),'NetPrice','IsDeterministic') AS 'Deterministic?'
,OBJECTPROPERTY(OBJECT_ID('UDFTotalAmount'),'IsDeterministic')'UDFDeterministic?'
,COLUMNPROPERTY(OBJECT_ID('SalesOrderDetailDemo'),'NetPrice','IsPrecise') AS 'Precise?'
5. Create an index when the requirements are met, and execute the previous query statement again
CREATE INDEX idx_SalesOrderDetailDemo_NetPrice
ON SalesOrderDetailDemo
(
NetPrice
)
GO
SELECT * FROM SalesOrderDetailDemo WHERE NetPrice>5000
GO
The performance result is as follows:
SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (3864 row (s) affected) Table 'salesorderdetaildemo'. Scan count 1, logical reads 757, Physical reads 0, read-ahead reads 0, lob logical reads 0, lob Physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 546 ms, elapsed time = 622 ms. |
Confirm disk space occupied by the index
SELECT
CASE index_id
WHEN 0 THEN 'HEAP'
WHEN 1 THEN 'Clustered Index'
ELSE 'Non-Clustered Index'
END AS Index_Type,
SUM(CASE
WHEN FilledPage > PageToDeduct THEN (FilledPage-PageToDeduct)
ELSE
0
END )* 8 Index_Size
FROM
(
SELECT
partition_id,
index_id,
SUM (used_page_count) AS FilledPage,
SUM (
CASE
WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
ELSE
lob_used_page_count + row_overflow_used_page_count
END
) AS PageToDeduct
FROM
sys.dm_db_partition_stats
GROUP BY
partition_id,index_id
) AS InnerTable
GROUP BY
index_id
GO
PS: the unit of output result is KB.