SQL Server Query Performance optimization--Creating index Principles (ii)

Last Update:2017-05-16 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Three: Principles of index establishment

In general, indexing depends on the scenario in which data is used, and in other words, the SQL statements that access the data are commonly used, and whether the statements are inefficient because of missing indexes (and possibly too many indexes). But not all SQL statements are indexed, and if all the SQL statements are indexed, it can result in too many indexes being built up.

I've come across thousands of new records per second, although the data table has only a clustered index, but because the values of the existing key-value field and the value of the new data-key field are not incremented sequentially, each time a new record is added, the overall data row is definitely rearranged. Performance increases by about 20% after the clustered index is removed. I've seen 20 indexes on a single data sheet, and it takes 4 seconds for a new record to complete.

In another case, the sales data changes in the POS system, to update multiple data tables at the same time, each table has hundreds of thousands of records above, but the where field used does not have a valid index, in addition to the slow lookup, resulting in a large number of locks on the update. After each data table plus the proper index, it will take a few 10 seconds for the update to be completed in less than a second.

In addition, depending on the time of use of the database, it is possible to set up an index first and then delete it. For example, when you are making annual and quarterly reports that require a large amount of data to be queried, consider establishing a relevant index before generating the report. However, after the report is generated, in order to ensure that the usual new, modified and deleted operation efficiency, and then delete in order to generate the report related indexes.

There are a few things you can refer to if the SQL statement or view is worth indexing.

First, selectivity

Optionally, the percentage of records that match your query criteria, i.e.

selectivity = number of records/Total records that meet the query criteria

The smaller the better, the smaller the higher the selection, and the more suitable the index. For example:

SELECT * from Wbk_goods_info where cop_g_no= ' 00078027 '

There are only 1 records in the Wbk_goods_info table that meet this condition, and the entire data table has 100,000 records, so the selectivity of this query is 1/100000, which represents very high selectivity, so it is efficient to find data by index. Conversely, or your statement is as follows:

Select * from Wbk_goods_info with (index (IDX_COP_G_NO)) where cop_g_no> ' 00018000 '

At this time the matching query records up to 82,000, the selectivity to 82000/100000, indicating that the selectivity is very low, unless the index established by the COP_G_NO character Chewei value is a clustered index, otherwise, if the nonclustered index is accessed, it becomes necessary to read at least 82,000 data pages, because the SQL When the server reads each record, it reads the entire data page first (remember, this is the basic unit of SQL Server reading data) and then the target record is fetched from it. Even if two records exist on the same data page, read the data page two times. Therefore, when selectivity is low, access through nonclustered indexes is a very inefficient way to access, rather than a direct full table scan.

Second, data density

Data density is one of the only record bars of a key value, which means

Data density =1/number of records with unique key value

Test by using the following statement: Index the Cop_g_no field in the Wbk_pde_list_org_histroy data table, and then use the DBCC SHOW_STATISTICS statement to query the records of the statistics on that index in the storage system. The above formula is then applied to test whether the value of the all density field stored in the system is the same:

--Create an index

CREATE nonclustered INDEX [idx_wbk_pde_list_cop_g_no] on [dbo]. [Wbk_pde_list_org_histroy]

(

[Cop_g_no] Asc

)

INCLUDE ([WBOOK_NO],[G_NO],[CODE_T],[UNIT_1],[TRADE_TOTAL],[GROSS_WT])

--Returns the value of the All desity field

DBCC show_statistics (' Wbk_pde_list_org_histroy ', ' idx_wbk_pde_list_cop_g_no ')

--Calculates the value of the All desity field

Select 1.0/(select COUNT (distinct cop_g_no) from Wbk_pde_list_org_histroy) [All Density]

When the data density is smaller, that is, the uniqueness is higher, the more appropriate to index the field, because when the total number of data is multiplied by the density value, it is the average number of records that are generally queried.

Third, data distribution

The distribution of data represents the way in which multiple data records are composed, which are related to the concept of density. It represents whether the data records are evenly scattered over a certain range, or are concentrated in some chunks. Its distribution is as follows.

For our example, the cargo number of each cargo is self-increasing and unique, that is, the Cargo information table (WBK_GOODS_INFO) has 100000 kinds of goods, in multiples of 2000 is the range of the segmentation point, the number of records in the data ranges are equal, this distribution is called the average distribution. or data type, it is quite easy and precise to calculate whether a query condition is very selective.

If the data is a standard distribution, that is, the data in some range, and some in the range of less, in this case, that is, some goods have a lot of sales records, some goods may basically have no sales records, the index needs to have more detailed statistics to record a range of data roughly how many records, When the query optimizer determines whether an index is applicable to a query, it is possible to determine exactly whether the field is highly selective to determine the index to use.

This is also when you observe the DBCC SHOW_STATISTICS (such as), if the distribution record of the presentation has a number of bars, indicating that the key value in the entire record is a standard distribution, so need to record the number of sections to more accurately determine the number of records that meet the criteria, if only a few three or four strokes, Represents a near-average distribution, only the state of the average distribution needs to be described.

Iv. Validity of the index

After establishing the corresponding index based on the above three principles, we will look at how to observe whether the query becomes more efficient and the index is used more efficiently after indexing.

--No index in the case

Set Statistics IO on

Select [Wbook_no], [Cop_g_no], [G_no]

, [Contr_item], [code_s], [code_t]

, [G_name], [G_model], [G_qty], [G_unit], [qty_1], [unit_1], [Trade_curr], [Decl_price], [T Rade_total], [GROSS_WT], [NET_WT] from Wbk_pde_list_org_histroy C

Where c.wbook_no= ' BE404942450001 ' and c.cop_g_no= ' 60196928 ' and qty_1>15

Select * from sys.dm_db_missing_index_groups

Select * from Sys.dm_db_missing_index_group_stats

Select * from Sys.dm_db_missing_index_details

Select Mig.*,statement as Table_name,column_id,column_name,column_usage

From Sys.dm_db_missing_index_details as mid

Cross apply Sys.dm_db_missing_index_columns (Mid.index_handle)

Inner join Sys.dm_db_missing_index_groups as MIG on Mig.index_handle=mid.index_handle

Order by mig.index_group_handle,mig.index_handle,column_id

---After the index is established, execute the above statement again.

Next, the sys_dm_db_index_usage_stats can be used to see if too many indexes have been generated.

--inserting data affects the index

Insert Wbk_pde_list_org_histroy

Select ' BE404942451001 ', ' 60196928 ', ' 11427 ', ' 305 ', ' 92 ', ' 52083200 '

, NULL, ' cloth ', null, ' 215 ', ' 011 ', ' 011 ', null,null,null,10.82,270.5,null,null,null,5,3.8

Table ' Wbk_pde_list_org_histroy '. Scan count 0, logical read 17 times, physical read 5 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

--Updating the data will affect the index

--by Pk_wbk_pde_list_org_histroy

--idx_wbk_pde_list_qty1

--idx_wbk_pde_list_cop_g_no Index Scan wbook_no= ' BE404942451001 ' records

Update Wbk_pde_list_org_histroy Set qty_1=50000

where wbook_no= ' BE404942451001 '

--table ' Wbk_pde_list_org_histroy '. Scan count 1, logical read 9 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

Select * from Sys.dm_db_index_usage_stats where object_id=object_id (' Wbk_pde_list_org_histroy ')

Figure 1, Index usage distribution map

Figure 2, Index name

As you can see from 1, the sys.dm_db_index_usage_stats system view refers to a query that uses the index to find, scan, find, or update operations that are counted as one use of the index, and each use adds 1 to the corresponding counter in the view. It separates the accumulated information for actions caused by user-submitted queries, as well as actions caused by queries generated internally by the system (for example, scanning to collect statistical data). Because the INSERT statement above affects all previously established indexes, the User_updates field for index_id equals 1, 6, 10 is 1 (see Figure 1, 2). The UPDATE statement updates the Qty_1 field in the datasheet, but does not update the Cop_g_no field, so only the index_id equals 1 and 6 of the row, and the User_updates field of the two lines is 2 (see Figure 1, 3). The Where condition of the UPDATE statement takes advantage of an index of index_id equal to 1, as shown in the User_seeks value of 1 (see Figure 1, 3).

The User_updates field refers to the number of updates caused by an INSERT, update, or delete operation on the underlying data table or view. This data can be used to determine if an index is rarely used by an application. If the number of updates to the index (user_updates) value is large, then the amount of maintenance generated is greater, see the number of searches (user_seeks) and Bookmark Lookup Operations (user_lookups), if these two values are small, you might consider dropping the index.

When you restart the SQL Server service, the various counters in the Sys.dm_db_index_usage_stats system view are initialized to null values. In addition, whenever data is detached or closed (for example, because Auto_Close is set to ON), all data rows associated with the database are deleted. When an index is first used, it is added to the system's statistics, and Sys.dm_db_index_usage_stats then sees the data rows that represent the index, at which point the initial set value of the counters is zero.

Finally, again, "Water can carry boats, can also be overturn it", the index is the same. Indexes can help improve retrieval performance, but too many or improper indexes can cause system inefficiencies. Because the user adds an index to the table, the database will do more work. Too many indexes can even cause index fragmentation.

So, we want to build an "appropriate" index system, especially for the creation of clustered indexes, should be better, so that your database can get high performance.

Because nonclustered indexes need to find a pointer to each row in the B-Tree of a nonclustered index, and then go to the table where the data is found, performance can be compromised, and sometimes not even a nonclustered index.

SQL Server Query Performance optimization--Creating index Principles (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More