SQL Server set function COUNT optimization analysis, sqlservercount

Source: Internet
Author: User

SQL Server set function COUNT optimization analysis, sqlservercount

Current version:

Microsoft SQL Server 2008 R2 (RTM)-10.50.1600.1 (Intel X86) Apr 2 2010 15:53:02 Copyright (c) Microsoft Corporation Enterprise Edition on Windows NT 5.2 <X86> (Build 3790: service Pack 2) (Hypervisor)


-- Create test table -- drop table tb_CountTestcreate table tb_CountTest ([uniqueidentifier] [uniqueidentifier] not null, [bigint] [bigint] not null, [tinyint] [tinyint] not null, [int] [int] not null, [int0] [int] null) go -- uniqueidentifier (16 bytes), bigint (8 bytes), int (4 bytes ), smallint (2 bytes), tinyint (1 byte) -- insert 2000 rows of test data into tb_CountTest ([uniqueidentifier], [bigint], [tinyint], [int], [int0]) select NEWID (), number * 3-1, number * 2% 256, number, case when number % 6 = 0 then null else number endfrom (select distinct number from master. dbo. spt_values where number between 1 and 2000) tabgo -- create a clustered index ([uniqueidentifier]) -- drop index partition on tb_CountTestcreate clustered index partition on tb_CountTest ([uniqueidentifier]) go -- create a non-clustered index ([int]) -- drop index ix_tb_CountTest_int on tb_CountTestcreate index ix_tb_CountTest_int on tb_CountTest ([int]) go


-- Run the following statement to view the execution plan. the result is as follows: select count (*) from dbo. tb_counttestselect count (1) from dbo. tb_counttestselect count ([uniqueidentifier]) from dbo. tb_counttestselect count ([bigint]) from dbo. tb_counttestselect count ([tinyint]) from dbo. tb_counttestselect count ([int]) from dbo. tb_counttest

We can see that the statistics are consistent. All of the above query statistics are 2000, all of which are scanned using non-clustered indexes (ix_tb_CountTest_int). The overhead of the statistics in the above 6 methods is the same. The following statistics use clustered index scan (ix_tb_CountTest_uniqueidentifier). The result is 1667 rows and the null value is filtered.

select count([int0]) from dbo.tb_counttest



Two problems:

Q1. Why are non-clustered index scans used?

Q2. why does count ([int0]) use clustered indexes?

 

A1. Why are non-clustered index scans used?

Because less data pages are returned using non-clustered indexes. All indexes are used. You can find the number of data pages returned when querying by index.

For example:

Dbcc traceon (3604,-1)

Dbcc ind (TestDB, tb_counttest,-1)

Dbcc page (TestDB, 1,590, 3) -- clustered index (root node)



Dbcc page (TestDB, 1,959, 3) -- Non-clustered index (root node)


As you can see above, the clustered index looks for 11 pages of data. With the 2-page IAM page, the total number of pages read by I/O is 13.

Instead of clustering index pages, the slave node has six index pages. With two IAM pages added, the total number of I/O read pages is eight.


It is reasonable to say that a non-clustered index includes key columns of the clustered index. However, if an index exists, you only need to access the page at the upper level to find the data, no data page (clustered index) or index page (non-clustered index) is actually accessed to the slave node ). Therefore, the non-clustered index (ix_tb_CountTest_int) is used for statistics using count ([uniqueidentifier. The database engine is automatically optimized.

When we use [set statistics io on] to view, in the first 6 cases of count (*), we read the data page 8, while count ([int0]) read 13 pages.


A2. Why does count ([int0]) use clustered indexes?

Because the column [int0] has a null value (null), when we execute the following, we find that the performance is very bad.

Force use of non-clustered index! (The result is that the null value is excluded)

Select count ([int0]) fromdbo. tb_counttestwith (index (ix_tb_CountTest_int ))


I/O reads 4008 pages, that is, first reads 2000 rows of data from the non-clustered index leaf for index scanning, and then reads the clustered index leaf 2000 traveling row key for search, add 2 IAM pages and 2 Index intermediate node pages read each time. There are 4008 pages in total. And the execution plan is not good. Therefore, count ([int0]) uses a clustered index.


-- Create another index now: -- create a non-clustered index ([tinyint]) -- drop index ix_tb_CountTest_tinyint on tb_CountTestcreate index ix_tb_CountTest_tinyint on tb_CountTest ([tinyint]) go

There are only 5 (index page) + 2 (IAM page) = 7 pages in total, so the performance is better!


This non-clustered index (ix_tb_CountTest_tinyint) scan is used when count statistics are also executed. This time, one page is missing from the index of the INT used above. In fact, the column type length in the index is reduced, so that more data can be stored in a page of data. When querying, fewer data pages are retrieved, and IO is better. The database engine automatically chooses statistics.


Appendix:

The COUNT function is similar to the COUNT_BIG function. The only difference between the two functions is their return values. COUNT always returns the int data type value. COUNT_BIG always returns the bigint data type value.


Summary:

1. Count statistics. As long as the column does not contain null values, the statistical performance is the same. The system selects the Optimal Index by default.

2. If the table has a smaller field for indexing, the statistics will use it for faster statistics.

Therefore, when the data in the count statistical table is slow, in addition to adding nolock to the table, you can also find a field with a small length to create an index.




Reference: http://bbs.csdn.net/topics/390635419 (Personal and author analysis is different)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.