server| Index
In the application system, especially in the online transaction processing system, the data query and processing speed has become the standard to measure the success or failure of the application system. And the use of index to speed up data processing has become the majority of database users to accept the optimization method.
Based on good database design, efficient use of indexes is the basis for SQL Server to achieve high performance, SQL Server employs a cost-based optimization model, which queries each submitted table, determines whether to use the index or the index. Because most of the overhead of query execution is disk I/O, one of the primary goals of using index to perform performance is to avoid full table scans, because full table scans require every page of the table to be read from disk, and if an index points to a data value, the query only needs to read a few disks. So if a reasonable index is established, the optimizer can use the index to speed up the query process of the data. However, the index does not always improve the performance of the system, in addition, deletion, the operation of the index will increase the presence of a certain amount of work, so, in the appropriate place to add the appropriate index and never reasonable place to remove the suboptimal index, will help optimize those poor performance of SQL Server applications. The practice shows that the reasonable index design is based on the analysis and prediction of various queries, and only if the index is combined with the program correctly can the optimal scheme be produced. This article makes some analysis and practice on the performance of SQL Server indexes.
First, the use of clustered index (clustered indexes)
A clustered index is a type of actual data on a disk that is organized to sort by the values of one or more of the specified columns. Because the indexed page pointer of a clustered index points to a data page, using a clustered index to find data is almost always faster than using a nonclustered index. Only one clustered index can be built per table, and a clustered index requires at least the additional space of the table 120% to hold a copy of the table and index the middle page. The idea of establishing a clustered index is:
1, most tables should have a clustered index or use partitioning to reduce the competition for the end of the table, in a high transaction environment, the last page of the blockade seriously affect the system's throughput.
2. Under the clustered index, the data is physically ranked on the data page, and the duplicate values are grouped together, so that when the query that contains the scope check (between, <, <=, >, >=) or the group by or order by is used, Once a row with the first key value in the range is found, rows with subsequent index values are guaranteed to be physically contiguous without further searching, avoiding a wide range of scans and greatly improving query speed.
3, in a frequent insert operation of the table to establish a clustered index, do not build in a monotonous appreciation of the column (such as identity), otherwise it will often cause blockade conflicts.
4. Do not include frequently modified columns in the clustered index, because the data rows must be moved to a new location after the code value is modified.
5, select the clustered index should be based on the WHERE clause and the type of join operation.
The candidate columns for the clustered index are:
1, the primary key column, which is used in the WHERE clause and is randomly inserted.
2. Columns accessed by range, such as Pri_order > Pri_order < 200.
3. Columns used in GROUP by or order by.
4. Columns that are not frequently modified.
5, the columns used in the connection operation.
Use of non-clustered index (nonclustered indexes)
The indexes that SQL Server establishes by default are nonclustered indexes, and because non-clustered indexes do not rearrange the data in the table, they store indexed column values for each row and point to the page where the data resides. In other words, a nonclustered index has an extra level between the index structure and the data itself. A table can have 250 nonclustered indexes if it does not have a clustered index. Each nonclustered index provides a different sort order for accessing data. When the nonclustered index is established, it is necessary to weigh the advantages and disadvantages of the index to speed up the query and reduce the speed of modification. In addition, these issues need to be considered:
1, the index needs to use how much space.
2, the appropriate column is stable.
3, the index key is how to choose, the scanning effect is better.
4, whether there are many duplicate values.
For frequently updated tables, nonclustered indexes on tables require more overhead than clustered indexes and no indexes at all. For each row that is moved to a new page, the page-level rows for each nonclustered index to that data must also be updated, and may sometimes require the index page's factoring. The process of deleting data from one page also has similar overhead, and the deletion process must also move the data to the top of the page to ensure continuity of the data. Therefore, it is very prudent to establish nonclustered indexes. Non-clustered indexes are often used in the following situations:
1, a column is commonly used in aggregate functions (such as Sum,....).
2, a column commonly used in Join,order by,group by.
3, the search out of the data does not exceed the table in the amount of data 20%.
Third, the use of coverage index (covering indexes)
An overlay index is a nonclustered index that contains all the information needed to search in an index entry, which is faster because the index page contains the data necessary for the lookup and does not need to access the data page. If the nonclustered index contains the result data, its query speed will be faster than the clustered index.
However, because there are more index entries covering the index, it takes up more space. And the update operation will cause the index value to change. Therefore, if a potential overwrite query is not commonly used or less critical, an increase in the coverage index degrades performance.
Iv. Selection Techniques of indexes
P_detail is a Housing Provident fund Management System Records personal details of the table, there are 890000 rows, observe the query under different indexes run effect, test in C/s environment, the client is IBM PII350 (Memory 64M), the server is Dec alpha1000a (Memory 128M ), the database is SYBASE11.0.3.
1, select COUNT (*) from P_detail where op_date> ' 19990101 ' and op_date< ' 19991231 ' and pri_surplus1>300 2 , select COUNT (*), SUM (PRI_SURPLUS1) from P_detail where op_date> ' 19990101 ' and pay_month ' between ' and ' 19 9912 ' do not build any index query 1 1 minute 15 sec Query 2 1 minute 7 sec on op_date Index query 1 57 seconds Query 2 57 seconds on Op_date clustered index query 1 <1 second Query 2 52 sec in Pay_month, Op_date, p Ri_surplus1 Index Query 1 34 second query 2 <1 seconds on Op_date, Pay_month, PRI_SURPLUS1 index query 1 <1 seconds Query 2 <1 sec
From the above query effect analysis, the index has no, the establishment of the different ways will lead to different query effect, choose what kind of index based on the user to the data query conditions, these conditions are embodied in the WHERE clause and join expression. Generally, the idea of indexing is:
(1) A primary key is often used as a WHERE clause, and a clustered index should be established on the primary key column of the table, especially when it is often connected.
(2) A clustered index may be considered for a column with a large number of duplicate values, often with range querying and sorting, grouping occurring, or columns that are accessed very frequently.
(3), often at the same time access to multiple columns, and each column contains duplicate values can be considered to establish a composite index to cover one or a group of queries, and the query refers to the most frequent columns as a leading column, if possible to make critical queries to form coverage queries.
(4), if you know that all the values of the index key are unique, then make sure that you define the index as a unique index.
(5) using FILLFACTOR (fill factor) to reduce page splitting while indexing on a table that is often inserted, reducing the occurrence of deadlocks by increasing concurrency. If you build an index on a read-only table, you can set the FILLFACTOR to 100.
(6) When selecting index keys, try to select those with small data types as keys so that each index page can hold as many key and pointers as possible, in this way, you can make a query must traverse the index page to minimize. Also, use integers as key values whenever possible, because it provides access faster than any data type.
V. Maintenance of the Index
As mentioned above, some inappropriate indexes affect the performance of SQL Server, as the application system runs, the data changes constantly, and when the data change reaches a certain degree, it will affect the use of the index. This requires the user to maintain the index himself. The maintenance of the index includes:
1. Rebuilding index
As data rows are inserted, deleted, and data pages are split, some index pages may contain only a few pages of data, while in addition to performing bulk I/O, rebuilding nonclustered indexes can reduce fragmentation and maintain the efficiency of large I/O. Rebuilding an index is actually a re-organizing B-tree space. You need to rebuild the index in the following situations:
(1), data and usage patterns vary greatly.
(2), the order of the ordering has changed.
(3), to do a large number of inserts or has been completed.
(4) More disk reads than expected for queries that use large chunks of I/O.
(5), due to a large number of data modification, so that the data pages and index pages are not fully used, resulting in the use of space beyond the estimate.
(6) There was a problem with DBCC checking out indexes.
When a clustered index is rebuilt, all nonclustered indexes for this table are rebuilt.
2. Update of index statistic information
When you create an index on a table that contains data, SQL Server creates a distributed data page that holds two statistics about the index: the distribution table and the density table. The optimizer uses this page to determine whether the index is useful for a particular query. But this statistic is not dynamically recalculated. This means that when the data for a table changes, the statistics are likely to be outdated, which affects the optimizer's pursuit of the most promising goals. Therefore, you should run the UPDATE STATISTICS command in the following situations:
(1), the insertion and deletion of data rows modify the distribution of the data.
(2) Adding data rows to a table that deletes data with TRUNCATE table.
(3), modify the value of the indexed column.
Vi. concluding remarks
The practice shows that improper indexing not only does not help, but reduces the performance of the system. Because a large number of indexes cost more system time than no indexes when inserting, modifying, and deleting operations. For example, an index established under the following conditions is not appropriate:
1. Columns that are rarely or never referenced in a query do not benefit from the index, because the indexes have little or no need to search for rows based on those columns.
2. Columns with only two or three values, such as men and women (yes or no), never benefit from the index.
In addition, because the index accelerates the query speed, it slows down the speed of data update. You can create a nonclustered index on another segment by building a table on a single segment, which improves operational performance on separate physical devices.