Tips for indexing SQL Server databases

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. What is an index?

One of the best ways to reduce disk I/O and logical reads is to use the index]
The index allows SQL Server to search for data in a table without scanning the entire table.

1.1. Benefits of indexes:

When the table does not have a clustered index, it becomes a [heap or heap table]
[Heap] is a pile of unprocessed data. It uses row identifiers as pointers to storage locations. Table data is neither ordered nor searchable unless it is traversed row by row. This process is called scanning ]. When clustered indexes exist, the non-clustered index pointer is composed of the values defined by the clustered index. Therefore, clustered indexes become very important.
Because the page size is fixed, the fewer columns, the more rows that can be stored. Since non-clustered indexes generally do not contain all columns, a page usually contains more non-clustered indexes. Therefore, SQLServer can read more values from a non-clustered index page than the table containing this column.
Another benefit of non-clustered indexes is that they are independent of the structure of data tables and can be placed in different file groups to use different I/O.
The index uses the B-tree as the storage structure, so the operations required to query specific rows are minimized.

1.2 index Overhead:

Too many indexes will lead to a longer time (the CUD part in INSERT/UPDATE/DELETE/CRUD.
When designing an index, You need:
For existing production systems, the overall impact of indexes needs to be measured, and the benefits of performance should be guaranteed to exceed the additional costs of processing resources. You can use Profiler to optimize the overall workload.
You can use DMV to view the immediate benefits of focus and indexing:
Sys. dm_db_index_operational_stats or sys. dm_db_index_usage_stats
Sys. dm_db_index_operational_stats: displays low-level activities of an index being used, such as I/O and locks.
Sys. dm_db_index_usage_stats: The statistics of various operations in an index at any time.
Although the overhead required to maintain indexes for DML increases, SQLServer must first find a row before updating or deleting the index, therefore, the index may be helpful for the update and delete statements using complex where clauses.

Ii. index design suggestions

The index design recommendations are as follows:
L check the where clause and connection condition columns;
L use narrow indexes;
L check column uniqueness;
L check the data type of the column;
L column order considerations;
L consider the index type (clustered index VS non-clustered index)

2.1 check the where clause and connection condition columns:
When a query is submitted to SQLServer, the optimizer performs the following steps:
1) The optimizer recognizes the columns contained in the where clause and connection conditions.
2) then the optimizer checks the indexes on these columns.
3) The optimizer determines the selectivity of the clause (that is, the number of rows returned) from the statistics maintained on the index to evaluate the effectiveness of each index.
4) Finally, the optimizer estimates the method for reading the limited row overhead Based on the information collected in the previous steps.
If there is no suitable where and join columns, the optimizer performs a full table scan.
Suggestion: Create an index on columns that are frequently used in the where clause or join condition to avoid table scanning. When the total data volume of a table is very small and can be put into a separate page (8 KB), the table scan may work better than the index search.

2.2 narrow indexes:
For the best performance, try to use fewer columns in the index. Avoid wide data columns.
The narrow index can include more rows than the wide index on the 8 KB index page, which can achieve the following results:
L reduce the I/O count (read less 8 KB pages)
L database cache is more effective because SQLServer can cache fewer index pages and reduce the logical read operations required for index pages in memory.
L reduce database storage space.

2.3 check column uniqueness:
Creating an index on a column with a very small range of possible values (such as gender) is not good for performance. Because the optimizer cannot use indexes to effectively reduce returned rows. Because a small range of values may cause [full table scan] or [clustered index scan ]. It is always the preferred solution to make the columns in the where clause have a large number of unique rows (or high selectivity) to limit the number of accessed rows. You should create indexes on these columns to help access small result sets.
In addition, the order of indexes created on multiple columns is related. In some cases, using the most selective column will make the index more effective.

2.4 check the column data type:
Index creation for the numeric type will be fast, because the size is small and arithmetic operation is easy. However, string matching is usually costly because the dimension is large and requires string matching.

2.5 consider column order:
In composite indexes, column order is an important factor in index efficiency:
L column uniqueness;
L column width;
L column data type;
Queries use the frontier of indexes to perform search operations to retrieve data. Put the most effective index at the forefront to filter data as soon as possible. Reduce data volume.

2.6 consider the index type:
Both clustered and non-clustered indexes store data in the B-tree. The following describes in detail

3. Clustered Index)

The leaf page of the clustered index is the same as the data page of the table. Therefore, the table rows are physically ordered by clustering index columns, because there is only one physical order in material resources, so there is only one clustering index.

3.1. Heap table:
A table without a clustered index is called a heap table. The data columns are not in any order and are connected to the adjacent pages of the table. Compared with accessing non-heap tables, the organizational structure increases the access overhead.
3.2 Relationship with non-clustered indexes:
An index row of a non-clustered index contains a pointer to the corresponding data row of the table. This pointer is called "row locator )]. The value depends on whether the data page is saved in the heap or aggregated. For non-clustered indexes, the row locator points to the RID pointer of the Data row in the heap. For clustering indexes, the row locator is the index key value of clustering indexes. When new data is migrated, non-clustered index relocation and paging may occur, affecting performance.
3.3 clustering index suggestions:
1) first create a clustered index:
Because all non-clustered indexes store the clustered index key value on their index rows, the order of creation is very important. For the best performance, we recommend that you create a clustered index before creating any non-clustered index.
2) Keep narrow indexes:
The overall length of the clustered index should be kept as small as possible. Because the length of the clustered index is too large, the non-clustered index will also increase. Therefore, the large clustered index key value not only affects its own width, but also expands all non-clustered indexes on the table, increases the number of index pages, and increases logical read and disk I/O.
3) Rebuilding the clustered index in one step:
Because clustering indexes are associated with non-clustering indexes, using drop index and create index will CREATE non-clustering indexes twice, in this case, you can use the DROP_EXISTING clause of the create index statement to recreate the clustered INDEX in a separate atomic step. Similarly, you can use it in Non-clustered indexes.
4) when to use a clustered index:
A) retrieve a certain range of data:
Because clustering indexes are created in physical order, the rational use of indexes can reduce the movement of heads and reduce the amount of physical I/O.
B) read pre-sorted data:
Clustering indexes are very effective for data to be sorted, which can reduce the sorting overhead after data reading.
For queries that read a wide range of rows and/or sort outputs, clustered indexes are generally more effective than non-clustered indexes.
5) When to disable clustered index:
In some cases, it is best not to use clustered indexes:
A) frequently updated columns:
If columns are updated frequently, non-clustered indexes will be relocated, increasing the query overhead of related operations. It also blocks other queries that reference the same part and non-clustered indexes during this time period, thus affecting data concurrency.
B) wide keywords: The cause has been described earlier.
C) Too many parallel sequential inserts:
If you want to insert new rows in parallel, it will be better to distribute them across multiple pages. With clustered indexes, all Inserts will be concentrated on the last page to form a huge "Hot Spot ", you can create an index for another column (the index does not sort the rows in the same order as the new rows) to randomly distribute the insert operation across the table, this problem occurs only when a large number of concurrent inserts occur. If the disk hotspot becomes a performance bottleneck, you can reduce the table fill factor to accommodate the intermediate page. In this way, hot pages will be stored in the memory, which is also conducive to performance.

Iv. Non-clustered Index

The non-clustered index does not affect the data sequence on the Table Page. For the heap table, the row locator points to the RID pointer of the Data row. For non-heap tables, the index key pointing to the clustered index.

4.1 Non-clustered index maintenance:
To optimize maintenance overhead, SQLServer adds a pointer to the old data page to point to the new data page after the page is split, instead of updating the row Locators of all related non-clustered indexes. Using clustered indexes as row locators reduces the overhead related to non-clustered indexes.
4.2 define a bookmarkdonsearch:
When the query request is not part of the non-clustered index selected by the optimizer, a search is required. This is a keyword search for a clustered index and a RID search for the heap table. Become: bookmarks search.
This query reads the corresponding data row from the table based on the row locator value of the index row. In addition to the logical read operations on the index page, a logical read operation on the data page is also required. However, if a query requires an index in a column, you do not need to access the data page. This is called "Overwrite Index". These bookmarked searches are the best reason for using clustered indexes in a large result set. The clustered index does not need to be bookmarked, because the leaf page and data page are the same.
4.3 non-clustered index suggestions:
1. When to use a non-clustered index:
It is most effective to read a small number of rows from a large table. As the number of rows increases, the cost of searching for bookmarks increases proportionally. Index columns should be highly selective.
Some indexing requirements are not suitable for clustered indexes:
L columns that are frequently updated
L Wide keywords
2. When to disable non-clustered indexes:
Non-clustered indexes are not suitable for querying a large number of rows. In this case, clustering index is better. Because you do not need to search for data rows by using bookmarks. If you want to read a large number of result sets from a table, the non-clustered index in the filtering and connection conditions does not help unless you use a non-clustered index to overwrite the index.

V. Clustered index VS non-clustered Index

Main considerations for selecting a clustered index or a non-clustered index:
L number of rows to be retrieved;
L data sorting requirements;
L index key width;
L column update frequency;
L bookmarks overhead;
L any disk hotspot;

5.1. Advantages of clustered indexes over non-clustered indexes:
When selecting the index type for a table without an index, clustered index is usually the first choice.
Using highly selective columns to read small result sets is a good inspiration for creating a non-clustered index on this column. However, it may be advantageous or even better to agree to the clustering index on the column.
Note: Although clustering indexes in many data searches are better than non-clustering indexes, a table has only one clustering index. Therefore, the clustering index should be kept in the most powerful condition.
5.2. Advantages of non-clustered indexes over clustered indexes:
Non-clustered indexes take precedence over clustered indexes in the following situations:
L The size of the index key is large.
L to avoid overhead related to all non-clustered indexes during Cluster Index reconstruction.
L The database reader works on the non-clustered index page, and the writer modifies other columns (not including non-clustered indexes) on the data page to avoid blocking.
L when querying all referenced columns (from a table), it can safely accommodate non-clustered indexes.
When you do not need to jump to a data row, the performance of non-clustered indexes should be as good as that of clustered indexes (or even better ). It is possible that the non-clustered index key contains the required columns in all tables.

Vi. Advanced Indexing Technology

L covered index:
L index crossover: multiple non-clustered indexes are used to meet all query column requirements (from one table)
L index connection: use index crossover and overwrite indexing technology to avoid hitting basic tables.
L filter indexes: to index fields or sparse columns with scattered data distribution, you can apply filters on the indexes so that they only index some data.
L index view: views are output to the disk to be materialized.

6.1 covering indexes:
Create a non-clustered index on all columns that meet the requirements of SQL queries and do not need to reach the basic table. If you encounter an index and do not need to reference the underlying data table at all, the index can be considered as a overwriting index. Use the INCLUDE operator to overwrite the index programming. Zhejiang stores data and indexes without modifying the index structure itself.
Covering indexes itself is a kind of swimming technique for reducing logical reads. It is best to use the following scenarios:
L you do not want to increase the size of the index key, but still want to overwrite the index;
L you want to index a data type that cannot be indexed (except text, ntext, and image );
L you have exceeded the maximum number of keyword columns for an index (but it is best to avoid this problem ).
1. pseudo clustered index ):
Physically, all index columns are organized sequentially. From the perspective of I/O, a clustered index is not programmed using a covered Index containing columns, which is used for all queries that fully meet the requirements of covering the columns in the index. If the query result set needs to be sorted, overwriting indexes can be used to physically maintain column data in the order required by the result set.
2. Suggestions:
When overwriting indexes, pay attention to the column list in the SELECT statement. Use as few columns as possible to maintain a small coverage index key size. If the number of bytes of all columns in the index is smaller than that of a single data row in the table, and the query that uses the overwriting index is often executed, the overwriting index is effective.
Before creating many overwriting indexes, consider how SQLServer can effectively and automatically use the index crossover to create an overwriting index for queries in real time.

6.2. Index crossover:
If a table has many indexes, SQLServer can use multiple indexes to execute one query. Select a small data subset based on each index, and then execute the intersection of two subsets (that is, only the rows meeting all conditions are returned)
However, in the real world, the following issues must be taken into account when modifying existing indexes:
L The existing indexes may not be modified for various reasons;
L The existing non-clustered index keys may be quite wide;
L The query overhead of using the existing index will be affected by this modification.
To improve the performance of a query, SQLServer can use multiple indexes on the table. Therefore, you need to create multiple narrow indexes instead of wide index keys.
Sometimes, you may have to create a separate non-clustered index for the following reasons:
L rearranging columns in an existing index is not allowed;
L some columns required to cover the index cannot be included in the existing non-clustered index;
L The total number of columns in two existing non-clustered indexes may overwrite the number of columns required by the index;
In these cases, you can create a non-clustered index on the remaining columns.

6.3 index connections:
The index connection is a variant of index crossover. It applies the Covering Index Technology to index crossover. If a single index does not overwrite the query, Multiple indexes can overwrite the query. SQLServer can use the index connection to completely meet the query without the need to go to the basic table.

6.4. Filter indexes:
Is a non-clustered index that uses a filter. Basically, a where clause is used. Create a highly selective keyword group for one or more columns that may not be highly selective. It is applicable to a large number of null values.
Filtering indexes returns in many ways:
L reduce the index size to improve query efficiency.
L create a smaller index to reduce storage overhead;
L reduced index maintenance costs due to reduced size.
The following special ANSI settings are required to filter indexes during access or creation:
ON: ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER
OFF: NUMERIC_ROUNDABORT

6.5. Index View:
SQLServer can create a unique clustered index on the view to implement the disk. Such an index becomes an index view or a materialized view. After creation, you can create a non-clustered index.
1. Benefits:
L aggregation can be pre-computed and stored in the index view to minimize expensive computing during query execution;
L tables can be connected in advance, and result sets can be physical;
L The composition of the connection or aggregation can be physical.

2. Overhead:
L any changes in the basic table must be reflected in the index view by executing the select statement of the transaction;
L any modification to the basic table defined in the index view may initiate the modification in the non-clustered index of the index view. If the clustering key is updated, the clustering index must also be updated;
L increasing database maintenance overhead in the index view;
L more storage is required in the database;
The following restrictions apply to creating an index View:
L The first index of the view must be a unique clustered index.
L The non-clustered index in the index view can only be created after the unique clustered index is created.
L The view definition must be deterministic-that is, it can only return one possible result for a given query;
L The index view must reference only the basic tables in the same database, rather than other views;
L The index view can contain floating-point columns, but such Columns cannot be included in the clustered index key;
L The index view must be bound to an architecture of the table referenced by the column to avoid modification of the table architecture;
L view-defined syntax has many restrictions
L list of SET options that must be determined:
ON: ARITHABORT, CONCAT_NULL_YIELDS_NULL, ANSI_NULLS, ANSI_PADDING, and ANSI_WARNING
OFF: NUMERIC_ROUNDABORT

3. Environment:
OLAP can benefit from the index view, and OLTP is more difficult to benefit from.

6.6. index compression:
Introduced from 2008. Compressing indexes can significantly improve performance, but it also causes CPU and memory overhead. Not suitable for all indexes.
By default, indexes are not compressed. The index must be explicitly compressed when an index is created. It can be divided into row-level and page-level compression. Non-leaf pages in the index do not accept compression of the page type.

VII. Special Index types

7.1 full-text index:
Text Field Index
7.2 spatial indexes:
Index Spatial Data
7.3. XML:
After introducing XML from 2005

8. Index attachment features

8.1 different column sorting sequence:
Different columns in an index can be sorted in ascending or descending order.
8.2 index on the computed column:
You can create an index on a calculated column as long as the expression of the calculated column meets certain restrictions. For example, the source table is determined.
8.3. Index of the BIT data type column:
The index created on the BIT data column is not a good advantage, but it is useful for overwriting the index when it covers the BIT column.
8.4 create index statement processed as a query:

8.5 concurrent index creation:
You can configure parameters in max degree of parallelism to control the number of processors in the create index statement, or use exec sp_configure 'maxdegree of parallelism'
8.6 online index creation:
You can reduce the chance of locking when creating an index.
8.7 Database Engine adjustment Consultant

IX. Summary

To determine the index key column of a special query, You need to evaluate the WHERE clause and connection conditions of the query. Such factors as column selectivity, width, data type, and column order. Because the index is mainly used to retrieve a small number of rows, the indexing selectivity must be very high.
To achieve better performance, try to overwrite the index to completely overwrite the query.

Tips for optimizing indexes in SQL Server databases

Common knowledge about indexes: Index is the biggest factor affecting database performance. Due to the complexity of the problem, I can only talk about it briefly. However, there are several good books for you to refer. Here I will only discuss two types of SQL Server indexes: clustered index and nonclustered index. When examining the types of indexes, you should consider the data type and the column that stores the data. Similarly, you must consider the types of queries that the database may use and the most frequently used types of queries.

Index type
If column stores highly relevant data and is frequently accessed in sequence, it is best to use the clustered index. This is because if the clustered index is used, the SQL Server physically goes in ascending order (default) or sort the data columns in descending order to quickly find the queried data. Similarly, when the search is controlled within a certain range, it is best to use clustered indexes for these columns. This is because there is only one clustered index on each table because of the physical data rearrangement.

In contrast to the above, if columns contains poor data relevance, you can use the nonculstered index. You can use up to 249 nonclustered indexes in a table-although I cannot imagine that so many indexes will be used in practical applications.

When a table uses the primary key (primary keys), SQL Server automatically creates a unique cluster Index for the column (s) containing the key by default. Obviously, creating a unique index for these columns (s) means that the primary key is unique. When establishing a foreign key relationship, if you plan to use it frequently, it is a good method to create a nonclustered index on the external key cloumn. If a table has a clustered index, it uses a linked list to maintain the relationship between data pages. Conversely, if the table does not have a clustered index, SQL Server saves the data page in a stack.

Data Page
When an index is created, SQLServer creates a data page (datapage), which is a pointer to accelerate search. When an index is created, the corresponding fill factor is also set. The fill factor is set to indicate the percentage of data pages in the index. Over time, database updates will consume existing free space, which will cause the page to be split. The consequence of page splitting is that the index performance is reduced. Therefore, queries using this index will result in fragmented data storage. When an index is created, the fill factor of the index is set. Therefore, the fill factor cannot be dynamically maintained.

To update the fill factor on the data page, we can stop the old index, re-create the index, and re-set the fill factor (note: this will affect the operation of the current database, use it with caution in important cases ). Dbcc indexdefrag and dbcc dbreindex are two commands used to clear tered and nonculstered index fragments. INDEXDEFRAG is an online operation (that is, it does not block other table actions, such as queries), while DBREINDEX physically reconstructs the index. In most cases, re-indexing can better eliminate fragmentation, but this advantage is to block other actions on the table where the index is currently located at the cost. When a large fragmented index occurs, INDEXDEFRAG takes a long time because the command is run based on a small interactive block (transactional block ).

Fill Factor
When you execute any of the above measures, the database engine can return the indexed data more effectively. The fillfactor topic is beyond the scope of this article, but I still remind you to pay attention to the tables that intend to use the fill factor to create an index.

When executing a query, SQL Server dynamically selects which index to use. Therefore, SQL Server determines which index to use based on the statistics distributed on this keyword on each index. It is worth noting that, after daily database activities (such as inserting, deleting, and updating tables), these statistics used by SQL Server may have expired and need to be updated. You can run dbcc showcontig to view the statistics status. When you think that the statistic has expired, you can execute the update statistics command of the table, so that SQL Server refreshes the information about the index.

Create a database maintenance plan
SQL Server provides a tool to simplify and automatically maintain databases. The Database Maintenance Plan Wizard (DMPW) tool also includes index optimization. If you run this wizard, you will see the index statistics in the database. These statistics are used as logs and updated regularly, which reduces the workload caused by manual Index reconstruction. If you do not want to automatically refresh the index statistics on a regular basis, you can also choose to re-organize the data and data pages in DMPW. This will stop the old indexes and re-create indexes based on specific fill factors.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tips for indexing SQL Server databases

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Tips for indexing SQL Server databases

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support