Tips for SQL Server database indexing its indexes _mssql

Source: Internet
Author: User
Tags create index numeric value
One, what is the index

One of the best ways to reduce disk I/O and logical read times is to use the index
The index allows SQL Server to look up data in a table without having to scan the entire table.

1.1, the advantages of the index:

Become a heap or heap table when the table does not have a clustered index
"Heap" is a bunch of unprocessed data, with a row identifier as a pointer to the storage location. Table data is not sequential and cannot be searched, unless traversed sequentially. This process is called "scanning." When a clustered index exists, the pointer to the nonclustered index consists of the values defined by the clustered index, so the clustered index becomes very important.
Because the page size is fixed, the fewer columns you have, the more rows you can store. Because nonclustered indexes typically do not contain all columns, a typical page bread contains more nonclustered indexes. So SQL Server can read more values from a nonclustered index page than the table that contains the column.
Another benefit of nonclustered indexes: a structure that is independent of the datasheet and can be placed in different filegroups with different I/O.
The index uses a B-tree as the storage structure, so the operations required to query a particular row are minimized.

1.2. Indexing overhead:

Too many indexes can cause (the cud part of the Insert/update/delete/crud) to take a longer time.
When you design an index, you do it from two angles:
For existing production systems, the overall impact of indexes needs to be measured to ensure that the benefits of performance outweigh the additional costs of processing resources. You can use the Profiler tool for overall workload optimization.
When focusing on the immediate benefits of indexing, you can use the DMV to view:
Sys.dm_db_index_operational_stats or Sys.dm_db_index_usage_stats
Sys.dm_db_index_operational_stats: Displays low-level activities for an index being used, such as I/O and locks.
Sys.dm_db_index_usage_stats: A statistic that occurs at any time in the various operations in an index.
Although the overhead required to maintain an index increases for DML, SQL Server must first find a row before updating or deleting it, so indexing may be helpful for using the complex WHERE clause update and DELETE statements.

Second, the index design recommendations

The index design recommendations are as follows:
L check WHERE clause and join condition column;
l use narrow Index;
L Check the uniqueness of the column;
L Check the data type of the column;
L consider column order;
L Consider index types (clustered index vs nonclustered index)

2.1. Check the WHERE clause and join condition columns:
When a query is submitted to SQL Server, the optimizer does the following steps:
1 The optimizer recognizes the columns contained in the WHERE clause and the join condition.
2 The optimizer then checks the indexes on these columns.
3 The optimizer evaluates the validity of each index by determining the selectivity of the clause (that is, how many rows are returned) from the statistics maintained on the index.
4 Finally, the optimizer estimates the least expensive way to read the rows, based on the information gathered in the previous steps.
The optimizer does a full table scan when there is no suitable where and connection columns.
Recommendation: Index on a column that is frequently used in a WHERE clause or join condition to avoid table scans. When the total amount of data in a table is so small that it can be put into a single page (8KB), the table scan may work better than the index lookup.

2.2. Use narrow index:
For best performance, try to use fewer columns in the index. You should also avoid columns of wide data types.
A narrow index can hold more rows in a 8KB index page than a wide index, and can achieve the following effects:
• Reduce I/O count (read fewer 8KB pages)
• Using database caching is more efficient because SQL Server can cache fewer index pages and reduce the logical reads required for indexed pages in memory.
• Reduce database storage space.

2.3, check the uniqueness of the column:
Creating an index on a very small range of possible values, such as gender, is not good for performance. Because the optimizer cannot use indexes to effectively reduce the rows returned. Because a small range of values can cause a "full table scan" or "Clustered Index scan." Making columns in a WHERE clause have a large number of unique rows (or high selectivity) to limit the number of rows accessed is always the preferred scheme. You should create indexes on these columns to help access small result sets.
In addition, the order is related to the creation of indexes on multiple columns. In some cases, using the most selective column will be more efficient than the index.

2.4. Check column data type:
Indexing a numeric value is quick, because it is easy to manipulate arithmetic because it is small in size. But the character size is large and requires string matching operations, which are usually more expensive.

2.5. Consider column order:
In a composite index, the column order is an important factor in indexing efficiency:
L column uniqueness;
L column width;
L column data type;
The query takes advantage of the forefront of the index to perform a lookup operation to retrieve data. Put the most efficient index to the forefront and filter the data as quickly as possible. Reduce the amount of data.

2.6, consider the index type:
Both clustered and nonclustered indexes store data in a B-tree. Here is a detailed description


three, clustered index (clustered index)

The leaf page of the clustered index is the same as the data page of the table. So the table rows are physically sorted according to the clustered index columns, because there can only be a physical order from the material, so there is only one clustered index.

3.1, Heap table:
A table with no clustered index is called a heap table. Data columns are not in any order and are connected to adjacent pages of the table. An unstructured structure increases the cost of access compared to accessing a non-heap table.
3.2, the relationship with the nonclustered index:
An index row of a nonclustered index contains a pointer to the table's corresponding data row. This pointer is called the line locator (row locator). Its value depends on whether the data page is saved in the heap or is aggregated. For nonclustered indexes, the row locator points to a pointer to the rid of the data row in the heap. For clustered indexes, the row locator is the index key value of the clustered index. When a new row of data enters, it may result in nonclustered index relocation, paging, and so on, affecting performance.
3.3, Clustered index recommendations:
1 first create the clustered index:
The order of creation is important because all nonclustered indexes hold clustered index key values on their index rows. For best performance, it is recommended that you create a clustered index before any nonclustered indexes are created.
2) Keep Narrow index:
The overall length of the clustered index should be kept as small as possible. Because the clustered index length is too large, nonclustered indexes also grow. Therefore, the large clustered index key values not only affect its width, but also enlarge all nonclustered indexes on the table, increase the number of index pages, and increase the logical reading and disk I/O.
3) One step to rebuild the clustered index:
Because clustered and nonclustered indexes are associated, using DROP Index to create index will cause the nonclustered index to be set up two times, at which point you can use the DROP_EXISTING clause of the CREATE INDEX statement to rebuild the clustered index in a separate atomic step, Similarly, it can be used in nonclustered indexes.
4 When to use a clustered index:
A to retrieve a certain range of data:
Because the clustered index is established in a physical order, indexing can reduce the movement of the head and reduce the amount of physical I/O.
(b) Read the data in a predetermined order:
For data that needs to be sorted, clustered indexes are very effective and can reduce the sorting overhead after reading data.
For queries that read large ranges of rows and/or sorted output, clustered indexes are often a more efficient choice than nonclustered indexes.
5 when no clustered index is used:
In some cases it is best not to use a clustered index:
A) frequently updated columns:
If the column is updated frequently, it will cause nonclustered indexes to relocate, increasing the overhead of the associated operation query. Other queries that refer to the same partial and nonclustered indexes will also be blocked for this time, thus affecting data parallelism.
b wide Keywords: The reason has been explained before
c) Too many parallel sequential inserts:
If you want to insert new rows in parallel, it is better to spread them across multiple pages, with a clustered index, all inserts are centered on the last page, creating a huge "hotspot" that randomly distributes the insert operation across the table by creating an index on another column that does not sort the rows in the same order as the new row. This problem occurs only when a large number of simultaneous inserts occur. If a disk hotspot becomes a performance bottleneck, you can accommodate the middle page by lowering the table's fill factor. This hot page will be in memory and also good for performance.

Iv. non-clustered index

Non-clustered indexes do not affect the order of data in the table page, and for the heap table, the row locator points to a pointer to the rid of the data row. For non-heap tables, index keys that point to the clustered index.

4.1. Non-clustered index maintenance:
To optimize maintenance overhead, SQL Server adds a pointer to the old data page to point to the new data page after the page is split, rather than updating the row locators for all relevant nonclustered indexes. The cost associated with nonclustered indexes is reduced for clustered indexes as row locators.
4.2. Define Bookmark Search:
When a query request is not part of the nonclustered index selected by the optimizer, a lookup is required, which is a keyword lookup for a clustered index and a RID lookup for the heap table. Become: Bookmark Lookup.
This lookup reads the corresponding data row from the table based on the row locator value of the index row, and requires a logical reading of the data page in addition to the logical read operation on the index page. However, if the query requires an index in the column, there is no need to access the data page, which is called the "overlay Index", which is why the large result set is best used for clustered indexes. A clustered index does not require a bookmark lookup because the leaf page and the data page are the same.
4.3. Non-clustered index recommendations:
1. When to use nonclustered indexes:
It is most efficient to read a small number of rows from a large table. As the number of rows increases, the cost of finding bookmarks is increased proportionally. Indexed columns should have a high selectivity.
Some index requirements are not suitable for clustered indexes:
L frequently updated columns
L Wide key word
2. When not to use nonclustered indexes:
Non-clustered indexes are not suitable for retrieving queries with large numbers of rows. It is better to use a clustered index at this time. Because you do not need a separate bookmark lookup to retrieve rows of data. If you need to read a large number of result sets from a table, then nonclustered indexes in the filter and join conditions are not helpful unless you use a nonclustered index to overwrite the index.


Clustered index vs non-clustered index

Select a clustered or nonclustered index to consider the main factors:
L The number of rows retrieved;
• Data ordering requirements;
L index key width;
l Column update frequency;
L Bookmark Overhead;
l any disk hotspot;

5.1. The advantages of clustered index relative to nonclustered index:
Clustered indexes are usually preferred when selecting the type of index on a table without an index.
Try to use a column with high selectivity to read a small result set is a good inspiration for creating nonclustered indexes on the column, but it may be equally advantageous or even better to agree on a clustered index on a column.
Note: Although clustered indexes are better than nonclustered indexes in many data searches, a single table has only one clustered index, so the clustered index should be kept in the strongest case.
5.2, nonclustered index relative to the benefits of clustered index:
Non-clustered indexes take precedence over clustered indexes in the following situations:
L index key size is very large.
L The associated cost of rebuilding all nonclustered indexes in order to avoid clustering index rebuilding.
L is that the database reader works on a nonclustered index page, while the writer modifies other columns in the data page (excluding nonclustered indexes) to avoid blocking.
l When querying all reference columns from a table, it is safe to hold nonclustered indexes.
The performance of a nonclustered index should be as good (or even better) as a clustered index without the need to jump to a data row. It is possible for nonclustered index keys to contain the columns required in all tables.

vi. Advanced Indexing technology

L Overwrite Index:
L Index Crossover: Use multiple nonclustered indexes to satisfy all column requirements for a query (from a table)
L Index Connections: Use index crossover and overlay indexing techniques to avoid touching basic tables.
L Filter Indexes: To be able to index fields with scattered data distributions or sparse columns, you can apply filtering on the index so that it indexes only some data.
L Indexed view: Materialized view output on disk

6.1, covering the index:
Nonclustered indexes are established on all the columns required to satisfy the SQL query without reaching the underlying table. If a query encounters an index and does not need to refer to the underlying datasheet at all, the index can be considered an overlay index. Using the include operator to enable index programming to overwrite indexes, Zhejiang stores data and indexes without having to modify the index structure itself.
Overwriting the index itself is a swimming technique for reducing logical reading. Use the best in the following situations:
L You don't want to increase the size of the index key, but still want to have an overlay index;
L You are going to index a type of data that cannot be indexed (except text, ntext, and images);
L You have exceeded the maximum number of key columns for an index (but it's best to avoid this problem).
1. Pseudo-Clustered index (pseudoclustered index):
Overrides the index to organize all indexed columns physically and sequentially. From the I/O perspective, a clustered index is not programmed with an overlay index that contains columns for all queries that are completely content to overwrite the columns in the index. If the query result set requires sorting, the overlay index can be used to physically maintain the column data in the order required by the result set.
2. Suggestions:
With the overlay index, note the list of columns in the SELECT statement. You should use as few columns as possible to keep small coverage index key dimensions. Overriding an index is valid if the number of bytes in the index is smaller than a single row of data in the table, and it is determined that queries that use the overlay index are often executed.
Before creating many overriding indexes, consider how SQL Server can efficiently and automatically use index crossings to create an overlay index for queries immediately.

6.2. Index Cross:
If a table has many indexes, SQL Server can use multiple indexes to execute a query. Select a small subset of data based on each index, and then perform a crossover of two subsets (that is, only those rows that meet all of the criteria are returned)
But in the real world, the following issues need to be considered when modifying an existing index:
The modification of existing indexes may not be allowed for a variety of reasons;
L existing nonclustered index keys may already be quite wide;
L The query overhead using an existing index will be affected by this modification.
To improve the performance of a query, SQL Server can use multiple indexes on a table, so consider creating multiple narrow indexes instead of wide index keys.
Sometimes, you may have to create a separate nonclustered index for the following reasons:
• Rearranging columns in existing indexes is not allowed;
L Some of the columns required to cover the index cannot be included in the existing nonclustered index;
L Two The total number of columns in an existing nonclustered index may be superfluous in the number of columns required to overwrite the index;
In these cases, you can create nonclustered indexes on the remaining columns.

6.3. Index Connection:
The index connection is a variant of the index crossover, which applies the overlay indexing technique to the index crossover. If you do not have an index that overrides a single query and multiple indexes can overwrite the query together. SQL Server can use an index connection to fully satisfy a query without having to go to a basic table.

6.4. Filter Index:
is a nonclustered index that uses a filter, basically the previous where clause. Create a highly selective group of keywords on one or more columns that may not be well selected. is more appropriate for a large number of null values.
Filtering an index brings rewards in many ways:
L Reduce index size to improve query efficiency.
• Create smaller indexes to reduce storage overhead;
L reduced the cost of index maintenance because of the reduced size.
Filtering an index requires a set of special ANSI settings at the time of Access or creation:
On:ansi_nulls,ansi_padding,ansi_warnings,arithabort,concat_null_yields_null,quoted_identifier
Off:numeric_roundabort

6.5, indexed view:
SQL Server can create a unique clustered index on the view to be materialized on disk. Such an index becomes an indexed view or a materialized view. You can create nonclustered indexes after you create them.
1. Benefits:
L aggregations can be computed in advance and stored in indexed views to minimize expensive computations during query execution;
L tables can be connected in advance and the result set can be materialized;
L The composition of the connection or aggregation can be physical.

2. Cost:
L The SELECT statement in the basic table that must perform transactions is reflected in the indexed view;
L Any modification on the basic table defined by the indexed view may initiate modifications in the nonclustered index of the indexed view, and if the clustered key is updated, the clustered index will also have to be updated;
L indexed view increases the maintenance cost of the database;
• More storage is required in the database;
Creating an indexed view includes the following restrictions:
The first index of the L view must be a unique clustered index.
Non-clustered indexes on indexed views can be created only after a unique clustered index has been created.
The L-View definition must be deterministic-that is, it can return only one possible result to a given query;
L Indexed views must refer only to basic tables in the same database, not to other views;
L Indexed views can contain floating-point columns but such columns cannot be included in the clustered index key;
L The indexed view must be a schema bound to the table referenced by the column to avoid modification of the table schema;
There are many limitations to the syntax of the L view definition
A list of SET options that must be determined:
On:arithabort,concat_null_yields_null,ansi_nulls,ansi_padding and Ansi_warning
Off:numeric_roundabort

3, the use of the environment:
OLAP can benefit from indexed views, which is more difficult for OLTP to benefit from.


6.6, Index compression:
Introduced from 2008. Compressing indexes can cause significant performance improvements, but can also cause CPU and memory overhead. is not a scenario for all indexes.
By default, indexes are not compressed. You must explicitly require the index to be compressed when creating an index. Divided into row-level and page-level compression. Non-leaf pages in the index do not accept compression under the page type.

VII. Special Index types

7.1. Full-Text indexing:
field index to text type
7.2. Spatial index:
Index space-type data
7.3. XML:
After the introduction of XML from 2005, the XML type

viii. attachment Characteristics of indexes

8.1. Different column sort order:
You can arrange the ascending and descending order of different columns in an index.
8.2, the index on the computed column:
You can create an index on a computed column, as long as the expression of the computed column conforms to a certain limit, such as the source table is OK.
8.3. Index on BIT data type column:
Creating an index on a bit data column is not a good advantage in itself, but for overriding the index, it is useful when the bit column is covered.
8.4, as a query processing of the CREATE INDEX statement:

8.5. Parallel index creation:
You can control the number of processors in the CREATE INDEX statement in the max degree of parallelism configuration parameter, or you can use exec sp_configure ' maxdegree of parallelism '
8.6. Online Index creation:
You can reduce the chance of a lock when you create an index.
8.7. Consider Database Engine Tuning Advisor

ix. Summary

To determine the index key columns for a particular query, you need to evaluate the query's WHERE clause and join condition. Factors such as column selectivity, width, data type, and column order. Because indexes are primarily intended to retrieve a small number of rows, index selectivity must be very high.
For better performance, try to overwrite the query completely with the overlay index.

tips for optimizing the SQL Server database's indexes

Common sense About indexing: Indexes are the biggest factor affecting database performance. Because of the complexity of the problem, I can only talk about it briefly, but there are several good books available for you to see here. I only discuss two types of SQL Server indexes here, the clustered index and the nonclustered index. When looking at what type of index to establish, you should consider the data type and the column that holds the data. Also, you must consider the type of query the database might use and the most frequently used query type.

Types of Indexes
If column holds highly correlated data and is often sequentially accessed, it is best to use the clustered index, because if you use the clustered index, SQL Server will rearrange the data columns physically in ascending (default) or descending order, This allows you to quickly find the data being queried. Similarly, the clustered index is best used for these column cases where the search control is within a certain range. This is because there is only one clustered index on each table because of the physical rearrangement of the data.

Conversely, if the columns contains poor data correlation, you can use the nonculstered index. You can use up to 249 nonclustered indexes in a table-although I can't imagine how many indexes will be used in practical applications.

When the table uses the primary key (primary keys), by default SQL Server automatically establishes a unique cluster index for the column containing that keyword. It is clear that establishing a unique index to these column (s) means uniqueness of the primary keyword. When establishing an external keyword (foreign key) relationship, if you intend to use it frequently, it is a good idea to establish a nonclustered index on the external keyword cloumn. If the table has a clustered index, it maintains the relationship between the data pages with a linked list. Conversely, if the table does not have a clustered index, SQL Server saves the data page in a stack.

data Pages
When the index is established, SQL Server establishes a data page (DataPage), which is the pointer to speed up the search. When the index is established, its corresponding fill factor is set. The purpose of setting the fill factor is to indicate the percentage of the data page in the index. Over time, the update of the database consumes the available free space, which causes the page to be split. The effect of a page split is to degrade the performance of the index, so queries that use the index can result in fragmentation of the data store. When an index is established, the fill factor for the index is set, so the fill factor cannot be dynamically maintained.

To update the fill factor in the data page, we can stop the old index and rebuild the index and reset the fill factor (note: This will affect the current database operation and be used sparingly on important occasions). DBCC INDEXDEFRAG and DBCC DBREINDEX are two commands for clearing clustered and nonculstered index fragmentation. Indexdefrag is an online operation (that is, it does not block other table actions, such as queries), while Dbreindex rebuilds the index physically. In most cases, rebuilding the index eliminates fragmentation better, but the advantage is in return for the cost of blocking other actions that are currently on the table where the index is located. When a large fragmentation index occurs, Indexdefrag takes a long time because the command is run based on a small interaction block (transactional blocks).

Fill Factor
When you perform any of these measures, the database engine can more efficiently return indexed data. The topic of fill factor (FILLFACTOR) has gone beyond the scope of this article, but I'd like to remind you to pay attention to the tables that you're going to use to create indexes with fill factors.

When executing a query, SQL Server dynamically chooses which index to use. To do this, SQL Server determines which index to use based on the amount of statistics that are distributed on that keyword on each index. It is worth noting that, after routine database activities, such as inserting, deleting, and updating tables, the statistics used by SQL Server may have expired and need to be updated. You can view the status of statistics by performing DBCC SHOWCONTIG. When you think the statistics are "expired," You can execute the UPDATE STATISTICS command on the table so that SQL Server refreshes the information about the index.

Establish a database maintenance plan
SQL Server provides a tool for simplifying and automatically maintaining databases. This tool, called the Database Maintenance Plan Wizard Wizard, DMPW, also includes optimization of indexes. If you run this wizard, you'll see statistics about the indexes in the database, which work as a log and are updated regularly, easing the effort to manually rebuild the index. If you do not want to automatically refresh the index statistics on a regular basis, you can also choose to rearrange the data and data pages in DMPW, which stops the old indexes and rebuilds the indexes by a specific fill factor.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.