SQL Server insert efficiency (heap table vs clustered index table)
"Which of the following is more efficient for SQL Server insert operations in heap tables or clustered index tables? Why is it high ?"
Some colleagues have asked me this question before. To ensure the record efficiency of the logstore, I did a simple test. I should first emphasize the following concepts:
Heap table:For a table without clustered indexes, the iam page and PFS page are used to determine which page has free space.
Clustered index table:For tables with clustered indexes, records are maintained in the logical order of the key values on the page where the clustered key values are located.
Demo:
Five concurrent threads are performed on the heap table and the clustered table respectively, and each thread inserts 10000 loops each time.
1. Heap Table Test
--1. Create a heap tableCreate TableInsert_test (IDInt Identity, NameChar(200))Go
Heap table insert time: 34.127 seconds
2. Clustered index table test
Create TableInsert_test2 (IDInt Identity Primary Key Clustered, NameChar(200))Go
Insert of clustered index table: 22.885 seconds
Result: The insert speed of the clustered index is more than 10 seconds faster than that of the heap table.(The personal machine configuration is different, the time difference will be high or low, and my notebook performance is low)
Analysis
Heap table insert:
Each insert operation is always inserted into any available space of the table. You can use Iam to find the range in the file that belongs to the target table, find out which pages in these intervals have available space on the PFS page. If there is no available space on the page, search for the available zones of a table to be allocated on the GAM page and SGAM page.
Clustered index:
Since my clustering key is an auto-incremental ID column, every insertion will be concentrated on the last data page.
In general, because the target location of the rows inserted in the heap table is not defined, it is generally less efficient to determine where to place rows in the heap table than to place rows in a table with clustered indexes.
Disadvantages of insert for clustered index tables
According to the above analysis, when the clustered index is an auto-incrementing column, the final data page will become the target page of the centralized insert, so it will become a hot spot. when it passes through, SQL server uses the locks, therefore, it is predicted that a large concurrent insert operation will cause resource blocking on the final page. The actual measurement is true:
(200 concurrent threads, 100 insert operations per Thread)
View the waiting resources during execution
SelectWait_type,Count(*)AsNum_waiting_tasks,Sum(Wait_duration_ms)AsTotal_wait_time_msFromSYS. dm_ OS _waiting_tasksWhereSession_id>50Group ByWait_typeOrder ByWait_type
As predicted, 98 requests are waiting for the locked resources.
It is inferred that if GUID is used as the primary key, the data pages will be scattered during insertion, And the hotspot pages will be tiled. This is indeed effective, but the split page costs will be quite high, split pages is also very damaging.
Let's continue to add a situation. If you need to perform a large number of insert operations for a long time, it would be better to use batch, and the effect would be faster. Change the above script to the following:
Declare @ I Int Set @ I = 1 While @ I <= 10000 Begin If @ I % 5000 = 0 Begin If ( @ Trancount > 0 ) Begin Commit Tran Begin Tran End End Insert Into Insert_test2 Select ' Aaa ' Set @ I = @ I + 1 End If ( @ Trancount > 0 ) Commit Tran
A single execution is reduced from 8 seconds to 3 seconds. If you are interested, you can test it on your own.
To put it simply, the insert operation requires the log pre-writing step. Each separate insert operation requires the LDF file to be written once. This performance is very low, if every 5000 inserts are included in a transaction and then committed,It puts a lot of smallTransactionMerge into a large suitableTransactionTo reduce disk write operations and greatly improve performance.Batch SizeHow big is the best? This depends on your machine and needs to be tested by yourself.
Http://www.cnblogs.com/SQLServer2012/archive/2013/01/30/2882815.html#2610890
"Which of the following is more efficient for SQL Server insert operations in heap tables or clustered index tables? Why is it high ?"
Some colleagues have asked me this question before. To ensure the record efficiency of the logstore, I did a simple test. I should first emphasize the following concepts:
Heap table:For a table without clustered indexes, the iam page and PFS page are used to determine which page has free space.
Clustered index table:For tables with clustered indexes, records are maintained in the logical order of the key values on the page where the clustered key values are located.
Demo:
Five concurrent threads are performed on the heap table and the clustered table respectively, and each thread inserts 10000 loops each time.
1. Heap Table Test
--1. Create a heap tableCreate TableInsert_test (IDInt Identity, NameChar(200))Go
Heap table insert time: 34.127 seconds
2. Clustered index table test
Create TableInsert_test2 (IDInt Identity Primary Key Clustered, NameChar(200))Go
Insert of clustered index table: 22.885 seconds
Result: The insert speed of the clustered index is more than 10 seconds faster than that of the heap table.(The personal machine configuration is different, the time difference will be high or low, and my notebook performance is low)
Analysis
Heap table insert:
Each insert operation is always inserted into any available space of the table. You can use Iam to find the range in the file that belongs to the target table, find out which pages in these intervals have available space on the PFS page. If there is no available space on the page, search for the available zones of a table to be allocated on the GAM page and SGAM page.
Clustered index:
Since my clustering key is an auto-incremental ID column, every insertion will be concentrated on the last data page.
In general, because the target location of the rows inserted in the heap table is not defined, it is generally less efficient to determine where to place rows in the heap table than to place rows in a table with clustered indexes.
Disadvantages of insert for clustered index tables
According to the above analysis, when the clustered index is an auto-incrementing column, the final data page will become the target page of the centralized insert, so it will become a hot spot. when it passes through, SQL server uses the locks, therefore, it is predicted that a large concurrent insert operation will cause resource blocking on the final page. The actual measurement is true:
(200 concurrent threads, 100 insert operations per Thread)
View the waiting resources during execution
SelectWait_type,Count(*)AsNum_waiting_tasks,Sum(Wait_duration_ms)AsTotal_wait_time_msFromSYS. dm_ OS _waiting_tasksWhereSession_id>50Group ByWait_typeOrder ByWait_type
As predicted, 98 requests are waiting for the locked resources.
It is inferred that if GUID is used as the primary key, the data pages will be scattered during insertion, And the hotspot pages will be tiled. This is indeed effective, but the split page costs will be quite high, split pages is also very damaging.
Let's continue to add a situation. If you need to perform a large number of insert operations for a long time, it would be better to use batch, and the effect would be faster. Change the above script to the following:
Declare @ I Int Set @ I = 1 While @ I <= 10000 Begin If @ I % 5000 = 0 Begin If ( @ Trancount > 0 ) Begin Commit Tran Begin Tran End End Insert Into Insert_test2 Select ' Aaa ' Set @ I = @ I + 1 End If ( @ Trancount > 0 ) Commit Tran
A single execution is reduced from 8 seconds to 3 seconds. If you are interested, you can test it on your own.
To put it simply, the insert operation requires the log pre-writing step. Each separate insert operation requires the LDF file to be written once. This performance is very low, if every 5000 inserts are included in a transaction and then committed,It puts a lot of smallTransactionMerge into a large suitableTransactionTo reduce disk write operations and greatly improve performance.Batch SizeHow big is the best? This depends on your machine and needs to be tested by yourself.
Http://www.cnblogs.com/SQLServer2012/archive/2013/01/30/2882815.html#2610890