SQL Server insert efficiency (heap table vs clustered index table)

Source: Internet
Author: User
SQL Server insert efficiency (heap table vs clustered index table)

"Which of the following is more efficient for SQL Server insert operations in heap tables or clustered index tables? Why is it high ?"

Some colleagues have asked me this question before. To ensure the record efficiency of the logstore, I did a simple test. I should first emphasize the following concepts:

 

Heap table:For a table without clustered indexes, the iam page and PFS page are used to determine which page has free space.

Clustered index table:For tables with clustered indexes, records are maintained in the logical order of the key values on the page where the clustered key values are located.

 

Demo:

Five concurrent threads are performed on the heap table and the clustered table respectively, and each thread inserts 10000 loops each time.

1. Heap Table Test

--1. Create a heap tableCreate TableInsert_test (IDInt Identity, NameChar(200))Go

Heap table insert time: 34.127 seconds

2. Clustered index table test

Create TableInsert_test2 (IDInt Identity Primary Key Clustered, NameChar(200))Go

Insert of clustered index table: 22.885 seconds

Result: The insert speed of the clustered index is more than 10 seconds faster than that of the heap table.(The personal machine configuration is different, the time difference will be high or low, and my notebook performance is low)

 

Analysis

 

Heap table insert:

Each insert operation is always inserted into any available space of the table. You can use Iam to find the range in the file that belongs to the target table, find out which pages in these intervals have available space on the PFS page. If there is no available space on the page, search for the available zones of a table to be allocated on the GAM page and SGAM page.

Clustered index:

Since my clustering key is an auto-incremental ID column, every insertion will be concentrated on the last data page.

In general, because the target location of the rows inserted in the heap table is not defined, it is generally less efficient to determine where to place rows in the heap table than to place rows in a table with clustered indexes.

 

Disadvantages of insert for clustered index tables

According to the above analysis, when the clustered index is an auto-incrementing column, the final data page will become the target page of the centralized insert, so it will become a hot spot. when it passes through, SQL server uses the locks, therefore, it is predicted that a large concurrent insert operation will cause resource blocking on the final page. The actual measurement is true:

(200 concurrent threads, 100 insert operations per Thread)

View the waiting resources during execution

 

SelectWait_type,Count(*)AsNum_waiting_tasks,Sum(Wait_duration_ms)AsTotal_wait_time_msFromSYS. dm_ OS _waiting_tasksWhereSession_id>50Group ByWait_typeOrder ByWait_type

 

As predicted, 98 requests are waiting for the locked resources.

It is inferred that if GUID is used as the primary key, the data pages will be scattered during insertion, And the hotspot pages will be tiled. This is indeed effective, but the split page costs will be quite high, split pages is also very damaging.

 

Let's continue to add a situation. If you need to perform a large number of insert operations for a long time, it would be better to use batch, and the effect would be faster. Change the above script to the following:

 Declare   @ I   Int   Set   @ I   =   1  While    @ I   <=  10000 Begin      If   @ I   %  5000   =   0      Begin            If ( @ Trancount  >  0  )  Begin                Commit   Tran               Begin   Tran            End      End       Insert   Into Insert_test2 Select   '  Aaa  '      Set   @ I    =  @ I   +   1  End If ( @ Trancount  >  0  )  Commit   Tran 

A single execution is reduced from 8 seconds to 3 seconds. If you are interested, you can test it on your own.

To put it simply, the insert operation requires the log pre-writing step. Each separate insert operation requires the LDF file to be written once. This performance is very low, if every 5000 inserts are included in a transaction and then committed,It puts a lot of smallTransactionMerge into a large suitableTransactionTo reduce disk write operations and greatly improve performance.Batch SizeHow big is the best? This depends on your machine and needs to be tested by yourself.

Http://www.cnblogs.com/SQLServer2012/archive/2013/01/30/2882815.html#2610890

"Which of the following is more efficient for SQL Server insert operations in heap tables or clustered index tables? Why is it high ?"

Some colleagues have asked me this question before. To ensure the record efficiency of the logstore, I did a simple test. I should first emphasize the following concepts:

 

Heap table:For a table without clustered indexes, the iam page and PFS page are used to determine which page has free space.

Clustered index table:For tables with clustered indexes, records are maintained in the logical order of the key values on the page where the clustered key values are located.

 

Demo:

Five concurrent threads are performed on the heap table and the clustered table respectively, and each thread inserts 10000 loops each time.

1. Heap Table Test

--1. Create a heap tableCreate TableInsert_test (IDInt Identity, NameChar(200))Go

Heap table insert time: 34.127 seconds

2. Clustered index table test

Create TableInsert_test2 (IDInt Identity Primary Key Clustered, NameChar(200))Go

Insert of clustered index table: 22.885 seconds

Result: The insert speed of the clustered index is more than 10 seconds faster than that of the heap table.(The personal machine configuration is different, the time difference will be high or low, and my notebook performance is low)

 

Analysis

 

Heap table insert:

Each insert operation is always inserted into any available space of the table. You can use Iam to find the range in the file that belongs to the target table, find out which pages in these intervals have available space on the PFS page. If there is no available space on the page, search for the available zones of a table to be allocated on the GAM page and SGAM page.

Clustered index:

Since my clustering key is an auto-incremental ID column, every insertion will be concentrated on the last data page.

In general, because the target location of the rows inserted in the heap table is not defined, it is generally less efficient to determine where to place rows in the heap table than to place rows in a table with clustered indexes.

 

Disadvantages of insert for clustered index tables

According to the above analysis, when the clustered index is an auto-incrementing column, the final data page will become the target page of the centralized insert, so it will become a hot spot. when it passes through, SQL server uses the locks, therefore, it is predicted that a large concurrent insert operation will cause resource blocking on the final page. The actual measurement is true:

(200 concurrent threads, 100 insert operations per Thread)

View the waiting resources during execution

 

SelectWait_type,Count(*)AsNum_waiting_tasks,Sum(Wait_duration_ms)AsTotal_wait_time_msFromSYS. dm_ OS _waiting_tasksWhereSession_id>50Group ByWait_typeOrder ByWait_type

 

As predicted, 98 requests are waiting for the locked resources.

It is inferred that if GUID is used as the primary key, the data pages will be scattered during insertion, And the hotspot pages will be tiled. This is indeed effective, but the split page costs will be quite high, split pages is also very damaging.

 

Let's continue to add a situation. If you need to perform a large number of insert operations for a long time, it would be better to use batch, and the effect would be faster. Change the above script to the following:

 Declare   @ I   Int   Set   @ I   =   1  While    @ I   <=  10000 Begin      If   @ I   %  5000   =   0      Begin            If ( @ Trancount  >  0  )  Begin                Commit   Tran               Begin   Tran            End      End       Insert   Into Insert_test2 Select   '  Aaa  '      Set   @ I    =  @ I   +   1  End If ( @ Trancount  >  0  )  Commit   Tran 

A single execution is reduced from 8 seconds to 3 seconds. If you are interested, you can test it on your own.

To put it simply, the insert operation requires the log pre-writing step. Each separate insert operation requires the LDF file to be written once. This performance is very low, if every 5000 inserts are included in a transaction and then committed,It puts a lot of smallTransactionMerge into a large suitableTransactionTo reduce disk write operations and greatly improve performance.Batch SizeHow big is the best? This depends on your machine and needs to be tested by yourself.

Http://www.cnblogs.com/SQLServer2012/archive/2013/01/30/2882815.html#2610890

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.