SQL Server Performance Tuning 2 indexing (index) establishment

Source: Internet
Author: User
Tags abs filegroup

Objective

An index is one of the most important objects in a relational database and can significantly reduce the consumption of disk I/O and logical reads to improve the lookup performance of a SELECT statement. But it is a double-edged sword, and improper use can affect performance: He needs extra controls to hold the index information, and when the data is updated, it takes some extra overhead to keep the index synchronized.

Image of the index is like a dictionary directory, you want to find a word can be based on its gestures/pinyin first in the directory to find the corresponding page number range, and then found in the range of the word. If you do not have this directory (index), you may need to go through the whole dictionary to find the word you are looking for.

The indexes in SQL Server are stored as b-tree, such as:


Build a clustered index (clustered index) to improve performance

As the data grows, the RDBMS faces a decline in query performance, which is specifically designed to solve this problem. A clustered index is the basis for all indexes, without which the data table is a heap. A clustered index determines the physical storage pattern of the data, so there can be only one clustered index on a single table. Information for all clustered indexes is recorded in the sys.partitions system view of SQL Server (their index_id is 1).

A clustered index can contain multiple fields (columns), and you should typically pick the filter fields that are frequently involved in most query statements, and you should look at the following items when you actually select them:

    • The field should contain a large number of distinct values. Example: Social Security Number
    • By default, the primary key field will automatically establish a clustered index, but this is not required and you can manually modify it to a nonclustered index (non-clustered)
    • Fields are often involved in filtering, i.e.: often in where, JOIN, ORDER by, and GROUP by statements
    • Fields are often involved in comparisons, i.e.: regular participation;, <, >=, <=, between, in operations
    • The shorter the field length, the better.

In addition, it is recommended to implement the following rules for clustered indexes where possible:

    • The contained fields are set to unique (unique) and are not empty (not NULL)
    • The shorter the length of the containing field, the better it is, and the fewer fields it contains.
    • Each table has a clustered index, and the field that is used frequently in WHERE is the field of the clustered index
    • Finesse Avoid establishing a clustered index on a varchar column

Let's do a performance comparison of 10w data (test data generation SQL please refer to appendix):

SELECT orderdate,amount,refno from Orddemo WHERE refno<3

Execution plan prior to index establishment:


CREATE CLUSTERED INDEX idx_refno on Orddemo (REFNO) go--Execute the same query statement again select Orderdate,amount,refno from Orddemo WHERE refno& Lt;3go
To build an indexed execution plan:


By contrast we can find that I/O consumption is reduced from 0.379421 to 0.0571991 and from Table Scan processing to Index Seek.

Set up nonclustered indexes (non-clustered index) to improve performance

As mentioned above, indexes can improve query performance effectively, but because a table can have only one clustered index, and a clustered index usually cannot contain all the necessary columns, SQL Server allows us to create a nonclustered index to achieve this requirement.

"To SQL Server 2005 allows 249 nonclustered indexes to be established. SQL Server 2008 and SQL Server 2012 allow 999 nonclustered Indexes "

Usually when you create a unique key on a field, a nonclustered index is automatically created on that column. Sys.partitions Information about nonclustered indexes (index_id>1) is stored in the system tables.

Before you set up a nonclustered index for a table, confirm the two point: does the table really require a nonclustered index? Does the table have the appropriate fields to establish a nonclustered index?

This is because indexing is not good enough not only to improve performance, but also to spend extra space to store indexes and generate extra I/O operations!

The following rules should be followed when establishing a nonclustered index selection field:

    • The field should contain a large number of distinct values.
    • Fields often participate in equivalent (=) operations
    • Fields are often involved in filtering, that is, often used in JOIN, ORDER by, and GROUP by statements

Proceed to the previous test to see the speed increase of the nonclustered indexes:

SELECT OrderDate from Orddemowhere orderdate= ' 2011-11-28 20:29:00.000 ' GO
Execution plans such as:


Create a nonclustered index and execute the query again:

CREATE nonclustered INDEX Idx_orderdateon Orddemo (OrderDate) goselect OrderDate from Orddemowhere orderdate= ' 2011-11-28 20:29:00.000 ' GO

The result of the comparison is very obvious, the I/O cost after the nonclustered index is established, the CPU costs, the consumption of Operator and so on drops sharply.

In our example, because the OrderDate field is not in the clustered index, the previous query is interpreted as an index scan. When we set up a nonclustered index on OrderDate, the query takes advantage of the index and interprets it as index seek.

As the table becomes more and more data, the space used to store the nonclustered indexes will become larger and progressively affect performance. In this case, a nonclustered index can be built into a separate database file or filegroup (filegroup), reducing the I/O operating pressure on the same file.

Reasonable index coverage to improve performance

Execute the following test SQL

SELECT Orderdate,orderid from Orddemowhere orderdate= ' 2011-11-28 20:29:00.000 ' GO
After observing the execution plan, you will find that the query is parsed as index scan instead of the previous index seek? This is because none of the two indexes we have built contain the OrderId field.

We deleted the non-clustered Index and re-built it.

--Include OrderId in the Create nonclustered INDEX Idx_orderdate_orderidon Orddemo (OrderDate desc,orderid ASC) GO
Execute the query again and execute the plan as


The query was resolved again to index seek without expectation.

Attention:

There is a maximum of 16 fields in an index, and the length of these fields must be less than three bytes.

The following types cannot be indexed as key fields (text, ntext, image, nvarchar (max), varchar (max), varbinary (max))

Adjust the index's included fields (including columns) to improve performance

The concept of an indexed containing field originates from SQL Server 2005,sql Server 2008 and 2012. It allows you to include non-key value (Non-key) fields in a nonclustered index that are not counted in the index (so that we are less likely to promote the upper limit of the indexed fields mentioned above). In addition, the types of these fields can be any type except text, ntext, or image.

Or use the previous test case: OrderId is not a key field because he doesn't filter in the WHERE clause, so it's not appropriate to use it as the key field for the index, and now we're using include to build it as a containing field:

--Delete Index of previous article Drop index Idx_orderdate_orderid on orddemogo--rebuild index create nonclustered index Idx_orderdate_includedon Orddemo (OrderDate DESC) INCLUDE (OrderID) go--query select Orderdate,orderid from Orddemowhere orderdate= ' 2011-11-28 20:29:00.000 ' GO
Execution plans such as:

The performance of this section is almost identical to that of the previous section, but with the Include field index (include column indexes), you are less constrained and more efficient to query with fewer indexes on index key fields.

Summarize the basic principles of distinguishing the key fields from the index and containing the fields:

    • Where, ORDER by, GROUP by, fields used in join-on apply to key fields
    • SELECT, the Used field in the having is applicable to the containing field

Use filtered index (filtered index) to improve performance

Filter cables are caused by SQL Server 2012, which you can think of as a nonclustered index with a WHERE clause. Use it appropriately to reduce the storage size and maintenance consumption of the index while improving query performance.

A regular index is the index of every piece of data in the entire table, and the filtered index only indexes records that satisfy a particular condition, which is defined by a WHERE clause when a filtered index is established.

You might consider using a filtered index like this:

A giant table with years of data, but only for the current year data.

A table that records product categories that contain many categories that are no longer used by expiration.

An order table that contains the Orderstartdate and Orderenddate fields. The orderenddate is updated when the order is completed, and other cases are null. You can create a filter index on the orderenddate so that you can use it when you need to query which orders are not completed.

Some settings need to be made when setting up a filtered index:

    • ARITHABORT = On
    • Concat_null_yields_null = On
    • QUOTED_IDENTIFIER = On
    • Ansi_warnings = On
    • ANSI_NULLS = On
    • Ansi_padding = On
    • Numeric_roundabort = OFF
Consider the example:

SET ansi_nulls ONSET ansi_padding ONSET ansi_warnings ONSET ARITHABORT ONSET concat_null_yields_null ONSET quoted_identif IER ONSET numeric_roundabort offgocreate nonclustered INDEX Idx_orderdate_filteredon Orddemo (OrderDate DESC) INCLUDE ( ORDERID) where OrderDate = ' 2011-11-28 20:29:00.000 ' goselect orderdate,orderid from Orddemo where orderdate= ' 2011-11-28 20:29:00.000 ' GO

I/O consumption is reduced from 0.0078751 in the previous section to 0.003125, and the optimization effect is significant.

Use Columnstore Index (Columnstore index) to improve performance

All we've discussed so far is the row store index (Rowstore index), where SQL Server 2012 starts supporting Columnstore indexes.

The row store index holds the data row in the data page, and the Columnstore index saves the data column in the data page. Suppose we have a table (Tblemployee), including EmpId, FirstName, and LastName three columns. The row store index/Columnstore index is represented in the following storage format:


Obviously when you need to find a filter for a column value, the Columnstore index has fewer data pages to access, which reduces I/o overhead and thus improves execution efficiency. Before you decide to take a Columnstore index, it is recommended that you confirm 3 points:

    • Whether your data table can be set to read-only (read-only)
    • Is your data sheet very large (millions or above)
    • If your database is OLTP, will it allow you to switch (on/off) the Columnstore index

If the answer to the above 3 points is OK, then you can start using the Columnstore index, but you will also be subject to the following restrictions:

    • You cannot include more than 1024 fields
    • The field type can only be as follows:

Int

‰‰big int

‰‰small int

‰‰tiny int

‰‰money

‰‰smallmoney

‰‰bit

‰‰float

‰‰real

‰‰char (N)

‰‰varchar (N)

‰‰nchar (N)

‰‰nvarchar (N)

‰‰date

‰‰datetime

‰‰datetime2

‰‰small datetime

‰‰time

‰‰datetimeoffset (Precision <=2)

‰‰decimal or numeric (precision <=18)


Okay, let's experiment with one of the following storage indexes. Execute the following code, and he will take advantage of our previously established clustered index:

SELECT  refno  , sum (Amount) as Sumamt  , avg (Amount) as Avgamtfrom  orddemowhere  Refno>3group by  Refnoorder  by Refnogo


We then delete the existing row store index and create the Columnstore index:

DROP INDEX idx_refno on orddemocreate nonclustered COLUMNSTORE indexidx_columnstore_refnoon Orddemo (AMOUNT,REFNO)
Execute the same query statement again, executing the plan as follows:



By comparison, we can see a significant decrease in I/O consumption:)

Note : Because the Columnstore index is established, the table is now read-only and if you want to revert to a writable state you must delete this Columnstore index!

Appendix

Generate SQL for test data

--Build Tables CREATE TABLE Orddemo (OrderID int IDENTITY, OrderDate datetime,amount money, refno int) go--Insert 100,000 test data insert into Orddemo (OrderDate, Amount, refno)  SELECT TOP 100000    DATEADD (minute, ABS (a.object_id% 50000), CAST (' 2011-11-04 ' as DATETIME), ABS (a.object_id%), CAST (ABS (A.OBJECT_ID) as VARCHAR) from  sys.all_objects ACROSS J OIN sys.all_objects BGO


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.