SQL Optimization-Index

Source: Internet
Author: User
Tags repetition

SYS. dm_exec_requests is a powerful tool to check the performance bottleneck of SQL Server.

Update statistics t_goods with fullscan --- Update table statistics
The index leaves of clustered indexes are the data itself.
The leaf node of the non-clustered index contains the bookmarks pointing to the data (that is, the data row number or the key of the clustered index)

Index optimization principles:
This mainly reduces the number of logical reads. Logical reads include the number of pages accessed from the memory data cache and the number of pages read from the physical disk.
Physical reads indicate the number of data pages that do not reside in the memory buffer and need to be read from the disk.
Optimization focuses on the number of logical reads.

When executing a query, SQL Server dynamically selects which index to use. Therefore, SQL Server determines which index to use based on the statistics distributed on this keyword on each index. It is worth noting that, after daily database activities (such as inserting, deleting, and updating tables), these statistics used by SQL Server may have expired and need to be updated. You can run DBCC showcontig to view the statistics status. When you think that the statistic has expired, you can execute the update statistics command of the table, so that SQL Server refreshes the information about the index.

Determines which fields need to be indexed Based on the frequency of use, and selects fields that are frequently used as join conditions, filtering conditions, aggregate queries, and sorting fields as index candidate fields.
Combine fields that often appear together to form a composite index. The field order of the composite index is the same as that of the primary key. You also need to put the most commonly used fields in front, put the fields with low repetition rate in front.

The selection of primary keys is also important. Generally, the selection of keys with a small length is faster, and a small key can reduce the B-tree hierarchy of the primary key.
When selecting a primary key, pay attention to the field order of the composite primary key. For composite primary keys, the performance of primary keys with different field order may vary greatly, generally, you should select a field with a low repetition rate, a single field, or a combination of fields with a high probability of query.

When using an index field as a condition, if the index is a composite index, you must use the first field in the index as the condition to ensure that the system uses the index, otherwise, the index will not be used, and the field order should be consistent with the index order as much as possible.

. If a parameter is used in the WHERE clause, a full table scan is performed. Because SQL parses local variables only at runtime, the optimizer cannot postpone the selection of the access plan to runtime; it must be selected at compilation. However, if an access plan is created during compilation, the value of the variable is still unknown and thus cannot be used as an input for index selection.

You can change it to force query to use the index:
Select ID from usertable with (index name) where code = @ code or select * From usertable with (Index = index_name) wherecode = @ code -- force search by index


Update the clustered index data column should be avoided as much as possible, because the order of the clustered index data column is the physical storage order of the table records. Once the column value changes, the order of the entire table record will be adjusted, it will consume a considerable amount of resources. If the application system needs to frequently update the clustered index data column, consider whether to create the index as a clustered index.

Use of Cluster Indexes

A clustered index sorts the actual data on a disk by the values of one or more specified columns. Because the index page pointer of the clustered index points to the data page, using the clustered index to search for data is almost always faster than using a non-clustered index. Each table can only create a clustered index, and creating a clustered index requires at least 120% additional space for the table to store copies of the table and the intermediate index page. The leaf page of the clustered index is the same as the data page of the table. The order of clustering index columns is not only the order of clustering index rows but also the physical order of data rows. Because a table can only store physical order. Therefore, a table can have only one clustered index. Clustering indexes do not need to jump from the index row to the basic row like non-clustering indexes.

Under the clustered index, data is physically arranged on the data page in order, and duplicate values are also arranged together, therefore, when the queries that contain range checks (between, <, <=,>, & gt; =) or use group by or order, once a row with the first key value in the range is found, the row with the subsequent index value is physically contiguous without further searching, avoiding large-scale scanning, this greatly improves the query speed.

When you create a clustered index on a table with frequent insertion operations, do not create a column with monotonous appreciation (such as the auto-increment column of identity). Otherwise, blocking conflicts may occur frequently.
Do not include columns that are frequently modified in the clustered index, because after the code value is modified, the data row must be moved to a new location.
Select the cluster index based on the where clause and connection operation type. The optional columns of the clustered index are:

● Primary Key column, which is used in the WHERE clause and inserted randomly.

● Columns accessed by range, such as p_order> 100 and p_order <1000.

● Columns used in group by or order.

● Columns that are not frequently modified.

● Columns Used in connection operations.

Note: Because the clustered index is not dependent on the clustered index, rebuilding the clustered index with a separate drop index and create index will cause all non-clustered indexes to be created twice. You can use create clustered index ix_tb (CODE) with (drop_existing = on) in one step.
Similarly, you can also use create nonclustered index ix_tb (CODE) with (drop_existing = on) on a non-clustered index)

Create nonclustered index ix_tb on table (C1, code)
With (data_compression = row) ---- specify index compression when creating an index (row compression or page compression)
Create nonclustered index ix_tb on table (C1, code)
With (data_compression = page) ---- specify index compression when creating an index (row compression or page compression)

Use of nonclustered Indexes

The non-clustered index stores the clustered index key on its index row (The table contains the clustered index ). Non-clustered indexes use the clustered index key value as its row positioner. A table that does not contain a clustered index is called a heap table. A non-clustered index stores a pointer (row Locator) That locates data rows from the index row.

The index created by SQL Server by default is a non-clustered index, because the non-clustered index does not re-organize the data in the table, instead, store index column values for each row and point them to the page where the data is located with a pointer. In other words, non-clustered indexes have an extra level between the index structure and the data itself. If a table does not have a clustered index, there are 250 non-clustered indexes. Each non-clustered Index provides different sorting orders for data access. When creating a non-clustered index, you must weigh the advantages and disadvantages of the index between the speed of query and the speed of modification. In addition, we need to consider these issues:

● How much space is required for the index.

● Whether the appropriate columns are stable.

● How to select the index key and whether the scan effect is better.

● Whether there are many repeated values.

For tables with frequent updates, non-clustered indexes require more additional costs than clustered indexes and no indexes at all. For each row to be moved to a new page, the page-level rows that point to the data of each non-clustered index must also be updated, and sometimes the index page splitting is required. The process of deleting data from a page has similar overhead. In addition, the deletion process must move the data to the top of the page to ensure data continuity. Therefore, it is very careful to create a non-clustered index. Non-clustered indexes are often used in the following scenarios:

● A column is often used in set functions (such as sum ,....).

● A column is often used for join, order by, and group.

● The retrieved data cannot exceed 20% of the table's data volume.

The pointer type included in the page-level pages of a non-clustered index depends on whether the non-clustered index is built on the heap or clustered index. ,
When a non-clustered index is created on the stack, the pointer is composed of the file ID, page ID, and the page Slot Number of the data. That is, the first file (1: 85691: 10) contains 85691 records on page 1.
When a non-clustered index is created on the clustered index, the pointer value is the clustered index key value of the data row. That is, you can use the clustered key navigation to retrieve the columns required by the index.

Overwrite the use of the index (covering indexes)

Covering indexes is a non-clustered index created on all columns that meet the requirements of SQL queries and do not need to reach the basic table. That is, if the query encounters an index and does not need to reference the underlying basic table at all, the index overwrites the index.
It can also be understood that the SQL query does not contain any columns that jump from the non-clustered index page to the data page of the basic table. That is to say, the non-clustered index page contains all the data to be queried, you do not have to search for data on the data page. If a non-clustered index contains result data, the query speed is faster than that of the clustered index.

However, it takes up a large amount of space to cover a large number of index items. In addition, the update operation will change the index value. Therefore, if the potential overwrite query is not commonly used or is not critical, increasing the index Overwrite will reduce the performance.

You can also understand overwriting indexes as follows: the non-clustered index page contains the clustered index key value and the value of other columns (for example, the column to be queried. This query is faster than querying with a foreign key index (or index. Because it will retrieve these column values directly from the non-clustered index page.
Actually, two steps are reduced (my personal understanding ):
1. Find the index page based on the index, and the index page contains the clustered index key. Then, based on the clustered index key (each primary key), find the page location of the actual row.
2. Then retrieve the drug query column from the corresponding location on the page.
Examples of covering indexes are as follows:
Create index fugai_index on mytable (productid) include (column1, column2)
The value of overwriting index columns is stored in the memory. If overwriting index contains too many columns, memory will be consumed. This is a deficiency.

Covering indexes: overwriting indexes allow the statement to obtain all required data without having to access the table and only access the index.
Because the clustered index leaf node is data, it doesn't matter whether it is covered or not. Therefore, the covered index is mainly for non-clustered indexes.
Covering indexes requires that the select clause and the where clause be included for all fields. The fields in the where statement are located at the beginning and those in the SELECT statement are located at the end.


Filter (filter) Indexes

For example, crate nonclustered index ix_where_qty on production. Product (qty) Where qty is null; -- where is required. Standard format for creating a filter index.

To index fields or sparse columns with scattered data distribution, You can reference and filter indexes to search for only some data. That is, non-clustered indexes using filters are used to create a highly selective keyword group on one or more columns that may not be highly selective.
For example, a column has many null values. To increase the query efficiency, you can create an index on the column to make the query overwrite the index. Although scanning is performed, logical reads, CPU time, And I/O competition are reduced. This is also a way to improve query performance.
The prerequisite for creating a filter index is:
Ansi_nulls = on, ansi_padding = on, ansi_warnings = on, arithabort = on, concat_null_yields_null = on, quoted_identifier = on
Numeric_roundabort = off

Due to the excessive insert, modify, and delete operations on the table, the index page is divided into multiple parts to form index fragmentation. If the index fragmentation is serious, the time for scanning the index will become longer, the index may even be unavailable, so the data retrieval operation slows down.

What is index fragmentation?

There are two types of index fragmentation: internal fragmentation and external fragmentation.

Internal fragmentation: In order to effectively use the memory and make the memory produce less fragments, the memory should be used in pages and pages. The last page is often not satisfied, therefore, internal fragments are formed.

External fragments: External fragments are formed when the segments are swapped in and out for sharing purposes. For example, after a 5 K segment is swapped out, a 4 K segment is placed in the original 5 k segment, therefore, 1 K external fragments are formed.

How do I know if index fragmentation has occurred?
Select object_name (Dt. object_id) tablename, Si. Name as indexname,
DT. avg_fragmentation_in_percent as externalfragmentation,
DT. avg_page_space_used_in_percent as internalfragmentation
(Select object_id, index_id, avg_fragmentation_in_percent,
From SYS. dm_db_index_physical_stats (db_id (N 'test'), null)
Where index_id <> 0) as dt
Inner join SYS. Indexes Si ON Si. object_id = DT. object_id
And Si. index_id = DT. index_id and DT. avg_fragmentation_in_percent> 10
And DT. avg_page_space_used_in_percent <75
Order by avg_fragmentation_in_percent DESC

The value of externalfragmentation is greater than 10, indicating that the corresponding index has undergone external fragmentation.
The value of internalfragmentation <75 indicates that the corresponding index has internal fragmentation.

How to sort index fragments?

There are two ways to sort index fragments:

1) reorganize the index with fragments: execute the following command

Alter index all on tablename reorganize

2) re-indexing: execute the following command

Alter index all on tablename rebuild with (fillfactor = 100, online = on)

You can also use the index name to replace the "all" keyword here to restructure or recreate a single index, or you can use the SQL server console to sort index fragments. Select the corresponding index, right-click and choose "regenerate" or "reorganize. DBCC indexdefrag and DBCC dbreindex are two commands used to clear tered and nonculstered index fragments. Indexdefrag is an online operation (that is, it does not block other table actions, such as queries), while dbreindex physically reconstructs the index. In most cases, re-indexing can better eliminate fragmentation, but this advantage is to block other actions on the table where the index is currently located at the cost. When a large fragmented index occurs, indexdefrag takes a long time because the command is run based on a small interactive block (transactional
Block ).

When Will restructuring and reconstruction be used?

When the external fragment value of the corresponding index is between 10-15 and the internal fragment value is between 60-75, re-build should be used in other cases.

It is worth noting that during index reconstruction, the table corresponding to the index will be locked, but the table will not be locked for reorganization. Therefore, in the production system, you should be careful when rebuilding the index of a large table, it may take several hours to create an index on a large table. Fortunately, since SQL Server 2005, Microsoft has proposed a solution, set the online option to on to ensure that the table can still be used normally when the index is re-created.

Although indexes can increase the query speed, if your database is a transactional database, most of the time it is an update operation, updating data means updating the index, at this time, both query and update operations should be taken into account, because creating too many indexes in the OLTP database table will reduce the overall database performance.

If your database is transactional, each table cannot have more than five indexes. If your database is Data Warehouse type, it is okay to create 10 indexes for each table on average.

Index selection

The indexes selected are based on the user's query conditions for data. These conditions are embodied in the WHERE clause and join expressions. In general, the idea of creating an index is

(1) The primary key is often used as a condition for the WHERE clause. You should create a clustered index on the primary key column of the table, especially when using it as a connection.
(2) create a clustered index for columns with a large number of duplicate values and frequent range queries and sorting and grouping, or columns that are frequently accessed.
(3) multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can create a composite index to overwrite one or more queries, and use the column with the most frequently referenced queries as the leading column, if possible, key queries may be overwritten.
(4) If you know that all values of the index key are unique, make sure that the index is defined as a unique index.
(5) When creating an index on a table with frequent insert operations, use fillfactor to reduce page splitting and increase concurrency to reduce the occurrence of deadlocks. If you create an index on a read-only table, you can set fillfactor to 100.
(6) When selecting the index key, try to select the columns that use the small data type as the key to make each cable

The index page can contain as many index keys and pointers as possible. In this way, you can minimize the number of index pages that must be traversed by a query. In addition, try to use an integer as the key value, because it can provide faster access than any data type.


With the insertion, deletion, and split of data rows, some index pages may contain only a few pages of data. In addition, when the application executes a large I/O, rebuilding a non-clustered index can reduce sharding and maintain the efficiency of large I/O blocks. Re-indexing is actually re-organizing the B-tree space. Re-indexing is required in the following cases:

(1) Significant changes in data and usage modes.
(2) The order of sorting changes.
(3) A large number of insert operations must be performed or have been completed.
(4) The number of disk reads to use a large I/O query is more than expected.
(5) due to a large amount of data modification, the insufficient use of data pages and index pages leads to overestimation of space usage.
(6) DBCC checks whether the index is faulty.

When the clustered index is rebuilt, all non-clustered indexes in this table will be duplicated.

Update index statistics

When creating an index on a table that contains data, SQL Server creates a distribution data page to store the two statistical information about the index: distribution table and density table. The optimizer uses this page to determine whether the index is useful for a specific query. However, the statistics are not dynamically recalculated. This means that when the table data changes, the statistical information may be outdated, thus affecting the optimizer's pursuit of the most useful goal. Therefore, run the update statistics command in the following cases:

(1) Data row insertion and deletion modify the data distribution.
(2) add data rows to tables that use truncate table to delete data.
(3) modify the index column value.

Use the primary key generated by the system as the clustered index or the column that needs to be used as the frequently used query condition as the clustered index Column

Do not set the key value (index key) for small tables)

When you use an aggregate index for a period of time, the search time decreases proportionally according to the percentage of data in the entire data table, regardless of the number of times the Aggregate Index uses

Using an aggregate index is faster than using an ordinary primary key for order by, especially when the data volume is small.

Using an aggregate index is faster than using a primary key that is not an aggregate index

Add all fields that need to increase the query speed to the clustered index to increase the query speed.

Create an aggregate index on a date column.

It is best not to create more than five indexes for each table.

Index the columns specified in the order by or group by clause.

Do not create indexes for frequently changed columns, because this increases the workload of maintaining sqlserver and the disk space.

Clustering indexes or non-clustered indexes should be created for columns that are frequently sorted by group, foreign key columns, primary key columns, columns involved in join operations, and data columns that return within a certain range.

When the inserted data is more than 10% of the number of records in the data table, you must first Delete the index of the table to improve data insertion efficiency. After the data is inserted, an index is created.

Avoid using functions or calculations on index columns. In the WHERE clause, if the index is a part of the function, the optimizer will not use the index and use full table scanning. For example:
Inefficient: Select * from Dept where Sal * 12> 2500;
Efficient: Select * from Dept where SAL> 2500/12;

Avoid using not and "! = ", The index can only tell what exists in the table, but cannot tell what does not exist in the table, when the database encounters not and"! = ", The full table scan will stop using the index.

Index column> =>
Inefficient: Select * From T_A where deptno> 3
Efficient: Select * From T_A where deptno> = 4
The difference between the two lies in that the former DBMS will directly jump to the first record with deptno equal to 4, while the latter will first locate the record with deptno equal to 3 and scan forward to the first record with deptno greater than 3.

If no index is used: not in, not between, and like (except for the first character, such as name like 'zhao % '), <>, is null/is not, null, query field addition Function

Columns with a large number of duplicate values and frequent range queries (between, >,<,>=, <=), order by, and group by are considered to create a clustered index;

Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;

The index does not contain one or more columns to be sorted;
The order of columns in the group by or order by clause is different from that of the index;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.