In the application system, especially in the online thing processing system, the data query and processing speed have become the standard to measure the application system.
The use of index to speed up the data processing speed has become the majority of database users to accept the optimization method.
On the basis of good database design, the ability to index efficiently is the foundation of SQL Server's high performance, and SQL Server uses a cost-based optimization model, which he uses to determine whether to use an index or an index for each query that is submitted about the table. Because the large departmental overhead of query execution is disk I/O, one of the primary goals of using indexes to perform high performance is to avoid full table scans, because full table scans need to read every data page of a table from disk, and if an index points to a data value, the query needs to read the disk several times. So if a reasonable index is established, the optimizer can use the index to speed up the query process of the data. However, the index does not always improve the performance of the system, the presence of indexes in the increment, delete, and change operations increases the amount of work, so it will help to optimize those poorly performing SQL Server applications by increasing the appropriate index and removing suboptimal indexes where appropriate. The practice shows that the reasonable index design is based on the analysis and prediction of various queries, so that the optimal scheme can be produced only if the index and the program are correctly combined. This article is an analysis and practice of the performance problems of SQL Server indexes.
I. Use of the clustered index (clustered indexes)
A clustered index is a sort of re-organization of the actual data on disk to the value of one or more of the specified columns. Because the index page pointer to the clustered index points to the data page, finding data using a clustered index is almost always faster than using a nonclustered index.
Only one clustered index can be built per table, and a clustered index requires at least the additional space of the table 120% to hold a copy of the table and an intermediate page of the index. The idea of building a clustered index is:
1. Most tables should have clustered indexes or use partitions to reduce the competition for the end of the table, in a high transaction environment, the blockade on the last page severely affects system throughput.
2. Under the clustered index, data is physically sorted sequentially on the data page, and duplicate values are also grouped together, so that when a query that contains a range check (between, <=, >, >=) or a group by or order by is found, the row with the first key value in the range , rows with subsequent index values ensure that they are physically contiguous together without further searching, avoiding a wide range of scans, and greatly improving the query.
3. When a clustered index is established on a table that has frequent insert operations, do not build on a monotonically rising merit column (such as identity), or you will often block the conflict together.
4. Do not include frequently modified column values in the clustered index, because the data row must be moved to a new location after the code value has been modified.
5. Select the clustered index to apply the type based on the WHERE clause and the join operation.
The candidate columns for the clustered index are:
1. Columns that are accessed by scope, such as pro_order>100 and pri_order<200.
2. Columns used in group by or Order by.
3. Columns used in the connection
Second, the use of non-clustered index (noclustered indexes)
SQL Server creates an index that is not clustered by default, because the nonclustered index does not reorganize the data in the table, but instead stores the indexed column values on each row and points to the page where the data is located. In other words, non-clustered indexes have an extra level between the index structure and the data itself. A table if there is no clustered index, there are 250 nonclustered indexes that provide a different sort order for accessing the data. When building a nonclustered index, weigh the pros and cons of the index's speed to query and reduce the speed of the change. Also consider these issues:
1. How much space does the index need to use.
2. The appropriate column is stable.
3. How the index key is selected, the scan effect.
4. Are there many duplicate values
Non-clustered indexes on tables require more overhead than clustered indexes and no indexes at all for frequently updated tables. For each row that is moved to a new page, the page-level industry for each non-clustered index that points to the data must be updated, and sometimes the indexing page needs to be split. The process of deleting data from one page also has a similar overhead, and the removal process must also move the data to the top of the page to ensure the continuity of the data. Therefore, it is very prudent to build non-clustered indexes. Non-clustered indexes are often used in the following situations:
1. A column is commonly used for aggregate functions such as SUM ().
2. A column is commonly used in Join,order by,group by.
3. The data queried does not exceed 20% of the amount of data in the table.
Three, the choice of index technology
1. The primary key is often used as a condition for a WHERE clause, and a clustered index should be established on the primary key of the table, especially when it is frequently used to connect.
2. A clustered index may be considered if there are a large number of duplicate values and there are frequently range queries and sorting, grouped columns that occur, or columns that are accessed very frequently.
3. If you know that the ownership of the index key is unique, be sure to define the index as a unique index.
4. When indexing on a table that often inserts operations, use the FILLFACTOR (fill factor) to reduce page splits, while increasing concurrency to reduce deadlocks. If you are indexing on a read-only table, you can set the FILLFACTOR to 100
5. When selecting the index key, try to select those with small data types as keys, so that each index page can accommodate as many index keys and pointers as possible, in this way, a query must be convenient to minimize the index page. Also, use as many as possible the integer as the key value, as he is able to provide an azimuth speed that is faster than all data types.
Iv. Maintenance of the index
As mentioned above, some unsuitable indexes affect the performance of SQL Server, and as the application runs, the data changes continuously, which affects the use of the index when the data changes to a certain extent. This requires the user to maintain the index themselves. The maintenance of the index includes:
1. Rebuilding the Index
As data rows are inserted, deleted, and data pages split, some index pages may contain only a few pages of data, and in addition, when performing large chunks of I/O, rebuilding a nonclustered index can reduce fragmentation and maintain the efficiency of large I/O. Rebuilding an index is actually a re-organization of B-tree space. The index needs to be rebuilt under the following conditions:
(1), data and usage patterns vary significantly.
(2), the order of sequencing changes.
(3), a large number of insert operations or completed.
(4), the disk that uses the large I/O query reads this book more than expected.
(5), a large number of data modifications, so that data pages and index pages are not fully used, resulting in the use of space beyond estimation.
(6), DBCC check out the index problem.
When the clustered index is rebuilt, all nonclustered indexes on this table will be rebuilt.
2. Update of index statistic information
When an index is created on a table that contains data, SQL Server creates a distributed data page to hold two statistics about the index: the distribution table and the density table. The optimizer uses this page to determine whether the index is useful for a particular query. However, this statistic is not dynamically recalculated. This means that when the data of the table is changed, the statistic information may be outdated, thus affecting the optimizer's goal of pursuing optimal work. Therefore, run the update statitics command in the following scenario:
(1), inserting and deleting data rows modifies the distribution of the data.
(2) Add data rows to the table where data is deleted with TRUNCATE table.
(3), modify the value of the indexed column.
SQL Server index usage and maintenance