Yesterday's SQL Server query performance optimization-index creation principle (I) mainly introduced the principle. today are some of the main principles and checks the created indexes.
Iii. indexing principles
In general, building indexes depends on the data usage scenarios. In other words, which SQL statements are commonly used to access data? Are these statements missing indexes (or there may be too many indexes) low efficiency. However, not all SQL statements must be indexed. If all SQL statements are indexed, too many indexes may be created.
I have encountered more than new records per second. Although this data table only has clustered indexes, however, because the values of existing key-value fields and newly added data key-value fields do not increase sequentially, each time a new record is added, the overall data rows must be rearranged. After the clustered index is removed, the performance is improved by about 20%. I have encountered a data table with 20 indexes. It takes four seconds to add a new record to the result.
In another case, sales data changes in the POS system must be updated to multiple data tables at the same time. Each data table has more than 100,000 records, but the WHERE field used does not have a valid index, in addition to slow searching, a large number of locks are generated during update. After the corresponding indexes are added to each data table, it takes dozens of seconds to update the data table.
In addition, you may create an index based on the database usage time before deleting it. For example, if you need to query a large amount of data for an annual report or quarterly report, you can create an index before generating the report. However, after the report is generated, the indexes created to generate the report are deleted to ensure the operation efficiency of the ADD, modify, and delete operations.
To check whether an index is worthwhile for an SQL statement or view, you can refer to the following aspects.
1. Selectivity
Indicates the percentage of records that meet your query conditions to the total number of records, that is
Selectivity = number of records that meet the query conditions/total number of records
The smaller the value, the better. The smaller the value, the higher the selection. The more suitable the index is. For example:
Select * from WBK_Goods_Info where COP_G_NO = '201312'
There are only one record in the WBK_Goods_Info table that meets this condition, and the entire data table has 100000 records. Therefore, the query selectivity is 1/100000, which indicates a very high selectivity, in this way, you can use indexes to find data efficiently. In other words, or your statement is as follows:
Select * from WBK_Goods_Info with (index (idx_cop_g_no) where COP_G_NO> '123'
In this case, the number of query records is 82000, and the selectivity is 82000/100000, which indicates that the selectivity is very low. Unless the index created with the COP_G_NO field as the key value is a clustered index, if non-clustered index is used for access, instead, it needs to read at least 82000 data pages, because SQL server first reads the entire data page when reading each record (remember, this is the basic unit for SQL SERVER to read data ), then retrieve the target record. Even if two records exist on the same data page, read the data page twice. Therefore, when the selectivity is very low, non-clustered index access is very inefficient. It is better to perform full table scanning directly.
Second, Data Density
Data density is one of the unique key-value records, that is
Data density = 1/unique key value record count
Run the following statement to test the function: Create an index in the COP_G_NO field of the WBK_PDE_LIST_ORG_HISTROY data table, query the statistics of the index in the storage system using the dbcc show_statistics statement, and then apply the formula above, to test whether it is the same as the value of the ALL Density field stored in the system:
-- Create an index
Create nonclustered index [idx_WBK_PDE_LIST_COP_G_NO] ON [dbo]. [WBK_PDE_LIST_ORG_HISTROY]
(
[COP_G_NO] ASC
)
INCLUDE ([WBOOK_NO], [G_NO], [CODE_T], [UNIT_1], [TRADE_TOTAL], [GROSS_WT])
-- Returns the value of the all desity field.
DBCC SHOW_STATISTICS ('wbk _ PDE_LIST_ORG_HISTROY ', 'idx _ WBK_PDE_LIST_COP_G_NO ')
-- Calculate the value of the all desity Field
Select 1.0/(select count (distinct COP_G_NO) from WBK_PDE_LIST_ORG_HISTROY) [All Density]
The smaller the data density, that is, the higher the uniqueness, indicates that the field is more suitable for indexing, because when the total number of data entries is multiplied by the density value, it is generally the average number of records queried.
Third, Data Distribution
Data distribution represents the composition of multiple data records, which is related to the concept of density. It indicates whether data records are evenly distributed within a range or in some blocks. The distribution is as follows.
In our example, the number of each cargo is self-incrementing and unique, that is, there are 100000 types of goods in the goods information table (wbk_goods_info, when the split point is a value range multiple of 2000, the number of records in each data range is equal. This distribution is called the average distribution. Or the data type is so, it is quite easy and accurate to calculate whether the selectivity of a query condition is very high.
If the data is distributed in a standard manner, that is to say, there are more data in some scopes and less data in some scopes. In this example, there are many sales records for some goods, some goods may basically have no sales records, so the index requires more detailed statistics to record the approximate number of records in a range, when the query optimization program determines whether an index applies to a query, it can accurately determine whether the selection of this field is high to determine the index to use.
This is when you observe Dbcc show_statistics (for example). If there are many distributed records, this indicates that the key value is in the standard distribution throughout the record, therefore, the number of records in each region segment is required to accurately determine the number of records that meet the conditions. If there are only three or four records, the number is close to the average distribution, you only need to describe the status of the average distribution.
Fourth, index Validity
After creating an index based on the above three principles, let's take a look at how to check whether the query statement becomes more efficient and whether the index is more efficient.
-- No Index
Set statistics io on
Select [WBOOK_NO], [COP_G_NO], [G_NO]
, [CONTR_ITEM], [CODE_S], [CODE_T]
, [G_NAME], [G_MODEL], [G_QTY], [G_UNIT], [QTY_1], [UNIT_1], [TRADE_CURR], [DECL_PRICE], [TRADE_TOTAL], [GROSS_WT], [NET_WT] from WBK_PDE_LIST_ORG_HISTROY c
Where c. WBOOK_NO = 'be4049000050001 'and c. COP_G_NO = '000000' and QTY_1> 15
Select * from sys. dm_db_missing_index_groups
Select * from sys. dm_db_missing_index_group_stats
Select * from sys. dm_db_missing_index_details
Select mig. *, statement as table_name, column_id, column_name, column_usage
From sys. dm_db_missing_index_details as mid
Cross apply sys. dm_db_missing_index_columns (mid. index_handle)
Inner join sys. dm_db_missing_index_groups as mig on mig. index_handle = mid. index_handle
Order by mig. index_group_handle, mig. index_handle, column_id
--- Execute the preceding statement again after the index is created.
Next, use sys_dm_db_index_usage_stats to check whether too many indexes are generated.
-- Inserting data will affect the index
Insert WBK_PDE_LIST_ORG_HISTROY
Select 'be404934751001', '123', '123', '123', '92', '123'
, Null, 'bry', null, '2017011', 25, '011', 25, '011', null, 215, 10.82, null, null, null, 5, 3.8.
Table 'wbk _ PDE_LIST_ORG_HISTROY '. Scan count 0, logical read 17 times, physical read 5 times, pre-read 0 times, lob logical read 0 times, lob physical read 0 times, lob pre-read 0 times.
-- Updating data will affect the index
-- Use PK_WBK_PDE_LIST_ORG_HISTROY
-- Idx_WBK_PDE_LIST_QTY1
-- Idx_WBK_PDE_LIST_COP_G_NO index scan WBOOK_NO = 'be404942451001' records
Update WBK_PDE_LIST_ORG_HISTROY set QTY_1 = 50000
Where WBOOK_NO = 'be404934751001'
-- Table 'wbk _ PDE_LIST_ORG_HISTROY '. 1 scan count, 9 logical reads, 0 physical reads, 0 pre-reads, 0 lob logical reads, 0 physical reads, and 0 lob pre-reads.
Select * from sys. dm_db_index_usage_stats where object_id = object_id ('wbk _ PDE_LIST_ORG_HISTROY ')
Figure 1 index usage distribution chart
Figure 2 Index name
You can see sys. dm_db_index_usage_stats System View indicates that the search, scan, search, or update operations performed by an index by a query are counted as one use of the index, each time a counter is used, 1 is added to the view. It accumulates information separately based on the queries submitted by the user and the operations (for example, scanning to collect statistics) caused by queries generated within the system. Because the preceding insert statement affects all the indexes created earlier, the user_updates field of the record rows whose index_id is equal to 1, 6, and 10 is 1 (see figure 2 ). The update statement updates the QTY_1 field in the data table but does not update the COP_G_NO field. Therefore, only the rows with index_id equal to 1 and 6 are affected, the user_updates field of the two rows is 2 (see Figure 3 in 1 ). The where condition of the update statement uses an index whose index_id is equal to 1. For details, see the value of user_seeks is 1 (see Figure 3 in Figure 1 ).
The User_updates field indicates the number of updates caused by the insert, update, or delete operation of the basic data table or view. This data can be used to determine whether an index is rarely used by the application. If the value of the index's update count (user_updates) is large, the maintenance volume is large. For more information, see the number of searches (user_seeks) and the number of bookmarked search operations (user_lookups ), if the two values are small, you can consider deleting the index.
When you restart the SQL SERVER service, various counters in the sys. dm_db_index_usage_stats System View are initialized to null values. In addition, when data is detached or closed (for example, because AUTO_CLOSE is set to ON), all data rows associated with the database are deleted. When an index is used for the first time, it is added to the system statistics. sys. dm_db_index_usage_stats then obtains the data row representing the index. At this time, the initial value of each counter is zero.
Finally, I would like to reiterate that the index is also the same as "water can be loaded or used. Indexing helps improve the search performance, but too many or improper indexing will also lead to low system efficiency. Because every time a user adds an index to a table, the database needs to do more work. Too many indexes may even cause index fragmentation.
Therefore, we need to establish an "appropriate" index system, especially for the creation of clustered indexes, so that your database can enjoy high performance.
Because the non-clustered index needs to find the pointer of each row in the B tree of the non-clustered index, and then find the data on the table where it is located, the performance will be greatly compromised, sometimes it is better not to add non-clustered indexes.