Improve SQL server data processing efficiency using Indexes
Based on a good database design, using indexes effectively is the foundation for SQL Server to achieve high performance. SQL server uses a cost-based optimization model, which queries related tables for each commit, determines whether to use or which index to use.
Because most of the overhead of query execution is disk I/O, an index is used to improve the performance to avoid full table scanning, because a full table scan needs to read every data page of the table from the disk, if an index points to the data value, the query only needs to read the disk several times. Therefore, if a reasonable index is set up, the optimizer can use the index to accelerate the Data Query Process. However, indexing does not always improve the system performance. In addition, deletion, and modification operations, the existence of indexes increases the workload. Therefore, adding appropriate indexes at the right place and deleting sub-optimal indexes from unreasonable places will help optimize SQL server applications with poor performance. Practice shows that a reasonable index design is based on the analysis and prediction of various queries. Only by correctly combining the index with the program can the best optimization solution be produced. This article analyzes and practices the performance of SQL Server indexes.
I. Use of clustered Indexes
A clustered index sorts the actual data on a disk by the values of one or more specified columns. Because the index page pointer of the clustered index points to the data page, using the clustered index to search for data is almost always faster than using a non-clustered index. Each table can only create a clustered index, and creating a clustered index requires at least 120% additional space for the table to store copies of the table and the intermediate index page. The idea of building a clustered index is:
1. Most tables should have clustered indexes or use partitions to reduce competition on the last page of the table. In a highly transactional environment, blocking the last page seriously affects the system throughput.
2. In the clustered index, data is physically arranged on the data page in order, and duplicate values are also arranged together, therefore, when the queries that contain range checks (between, <, <=,>,> =) or use group by or order, once a row with the first key value in the range is found, the row with the subsequent index value is physically contiguous without further searching, avoiding large-scale scanning, this greatly improves the query speed.
3. When you create a clustered index on a table with frequent insert operations, do not create a column with a monotonous appreciation (such as identity). Otherwise, blocking conflicts may often occur.
4. Do not include frequently modified columns in the clustered index, because after the code value is modified, the data row must be moved to a new location.
5. Select the cluster index based on the where clause and connection operation type.
The optional columns of the clustered index are:
1. Primary Key column, which is used in the WHERE clause and inserted randomly.
2. Columns accessed by range, such as pri_order> 100 and pri_order <200.
3. Columns used in group by or order.
4. columns that are not frequently modified.
5. Columns Used in connection operations.
Ii. Use of non-clustered Indexes
The index created by SQL Server by default is a non-clustered index, because the non-clustered index does not re-organize the data in the table, instead, store index column values for each row and point them to the page where the data is located with a pointer. In other words, non-clustered indexes have an extra level between the index structure and the data itself. If a table does not have a clustered index, there are 250 non-clustered indexes. Each non-clustered Index provides different sorting orders for data access. When creating a non-clustered index, you must weigh the advantages and disadvantages of the index between the speed of query and the speed of modification. In addition, we need to consider these issues:
1. How much space does the index need.
2. Check whether the appropriate columns are stable.
3. How to select the index key and check whether the scan effect is better.
4. Whether there are many repeated values.
For tables with frequent updates, non-clustered indexes require more additional costs than clustered indexes and no indexes at all. For each row to be moved to a new page, the page-level rows that point to the data of each non-clustered index must also be updated, and sometimes the index page splitting is required. The process of deleting data from a page has similar overhead. In addition, the deletion process must move the data to the top of the page to ensure data continuity. Therefore, it is very careful to create a non-clustered index. Non-clustered indexes are often used in the following scenarios:
1. A column is often used in set functions (such as sum ,....).
2. A column is often used for join, order by, and group.
3. The retrieved data cannot exceed 20% of the table's data volume.
Iii. Use of covering Indexes
A covered index is a non-clustered index that contains all the information required for search. This index is faster because the index page contains the necessary data for search, you do not need to access the data page. If a non-clustered index contains result data, the query speed is faster than that of the clustered index.
However, it takes up a large amount of space to cover a large number of index items. In addition, the update operation will change the index value. Therefore, if the potential overwrite query is not commonly used or is not critical, increasing the index Overwrite will reduce the performance.
Iv. indexing selection technology
P_detail is a table that records personal details in the housing provident fund management system. There are 890000 rows. Check the running results of queries under different indexes and test the results in the C/S environment, the client is IBM pii350 (64 MB memory), the server is Dec alpha1000a (128 MB memory), and the database is sybase11.0.3.
1. Select count (*) from p_detail where
Op_date> '2013' and op_date <'
19991231 'and pri_surplus1> 300
2. Select count (*), sum (pri_surplus1) from p_detail
Where op_date> '123' and
Pay_month between '20140901' and '20160301'
No index is created for query at 1 minute 15 seconds
Query 2 1 minute 7 seconds
Create a non-clustered index query on op_date for 1 57 seconds
Query in 2 57 seconds
Create a clustered index query on op_date for 1 <1 second
Query in 2 52 seconds
Index query on pay_month, op_date, and pri_surplus1 for 34 seconds
Query 2 <1 second
Index query on op_date, pay_month, and pri_surplus1: 1 <1 second
Query 2 <1 second
From the analysis of the preceding query results, whether or not the index is available or not, different creation methods will lead to different query results. What indexes are selected based on the user's query conditions for data, these conditions are embodied in the WHERE clause and join expressions. In general, the idea of creating an index is:
(1) The primary key is often used as a condition for the WHERE clause. You should create a clustered index on the primary key column of the table, especially when using it as a connection.
(2) create a clustered index for columns with a large number of duplicate values and frequent range queries and sorting and grouping, or columns that are frequently accessed.
(3) multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can create a composite index to overwrite one or more queries, and use the columns with the most frequent queries as the leading columns, if possible, key queries may be overwritten.
(4) If you know that all values of the index key are unique, make sure that the index is defined as a unique index.
(5) When creating an index on a table that often performs insert operations, use fillfactor to reduce page splitting and increase concurrency to reduce the occurrence of deadlocks. If you create an index on a read-only table, you can set fillfactor to 100.
(6) When selecting the index key, try to select the columns that use the small data type as the key so that each index page can accommodate as many index keys and pointers as possible, you can minimize the number of index pages that must be traversed by a query. In addition, try to use an integer as the key value, because it can provide faster access than any data type.
5. Index Maintenance
As mentioned above, some inappropriate indexes affect the performance of SQL Server. As the application system runs, data constantly changes, when the data changes to a certain degree, the index usage will be affected. You need to maintain the index on your own. Index maintenance includes:
1. Re-Indexing
With the insertion, deletion, and split of data rows, some index pages may contain only a few pages of data. In addition, when the application executes a large I/O, rebuilding a non-clustered index can reduce sharding and maintain the efficiency of large I/O blocks. Re-indexing is actually re-organizing the B-tree space. Re-indexing is required in the following cases:
(1) Significant changes in data and usage modes.
(2) The order of sorting changes.
(3) A large number of insert operations must be performed or have been completed.
(4) The number of disk reads queried using a large I/O is more than expected.
(5) due to a large amount of data modification, the insufficient use of data pages and index pages leads to overestimation of space usage.
(6) DBCC checks that the index is faulty.
When the clustered index is rebuilt, all non-clustered indexes in this table will be rebuilt.
2. Update index statistics
When creating an index on a table that contains data, SQL Server creates a distribution data page to store the two statistical information about the index: distribution table and density table. The optimizer uses this page to determine whether the index is useful for a specific query. However, the statistics are not dynamically recalculated. This means that when the table data changes, the statistical information may be outdated, thus affecting the optimizer's pursuit of the most useful goal. Therefore, run the update statistics command in the following cases:
(1) Data row insertion and deletion modify the data distribution.
(2) add data rows to tables that use truncate table to delete data.
(3) modify the index column value.
Vi. Conclusion
Practice shows that the improper index does not help but reduces the system's execution performance. Because a large number of indexes take more time to insert, modify, and delete than no index. For example, the index created below is inappropriate:
1. Few or never referenced columns in a query will not benefit from the index, because the index has very few or never has to search for rows based on these columns.
2. Columns with only two or three values, such as male and female (yes or no), will never benefit from the index.
In addition, the index accelerates the query speed, but slows down the data update speed. You can create a table on one segment, and create a non-clustered index on the other segment, which improves the operation performance on a separate physical device.