Analysis of the locking problem of SQL Server to prevent the range lock of Phantom read under Serializable isolation level

Last Update:2017-09-11 Source: Internet

Author: User

Tags serialization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original text: A brief analysis of the locking problem of SQL Server to prevent the range lock of Phantom read under Serializable isolation level

The source of this article: http://www.cnblogs.com/wy123/p/7501261.html
(It is not the original works right to retain the source, I my book still far to reach, just to link to the original text, because the following may exist some errors to amend or supplement, without him)

In the process of dealing with concurrent things, the database has different lock performance under different isolation levels, and under the non serializable isolation level, there are dirty reads, non-repeatable reads, lost updates, Phantom reads and so on.
This article does not discuss dirty reads and non-repeatable reads and missing updates, only the Phantom reads, which are in one thing, the same condition, where there are two times the number of rows of data read inconsistent.
The highest isolation level also allows the serialization of isolation levels to eliminate phantom reads, and the elimination of Phantom reads through a range lock (also known as a range lock) to isolate things.
So, how does a range lock produce? When a range lock is generated, how is the locked range determined? What is the difference between range lock ranges produced by different indexes?
This article will carry on a superficial analysis and inference.

A lot of information has been consulted, and a very clear answer has not been obtained, because:

1, do not specify the range of the lock scope, observe the lock when you see the range lock generated after the end, and did not analyze the range lock generated, the specific extent of the lock, lock the existing value is no problem, whether to lock the value of not exist?
1, non-unique index with a unique index in the case of the resulting range lock, the locked range package does not include the threshold value?
2, for the query table does not exist in the key value (two, one is between the maximum and the smallest key in the table, one is located outside the maximum or minimum key value), what is the scope of the lock?

The test found an interesting problem, for a unique index, when the target is a table in the existence of a key value, the surface is generated by a key lock, really just locked the current key (data row)?

Explain the principles of this test:
1, the tests are tested at Serializable isolation levels (SET TRANSACTION isolation level serializable).
2, the principle of testing is that Session1 in the use of exclusive locking lock, the use of shared lock and exclusive lock is incompatible with the characteristics of the Session2 in the use of shared locks to constantly detect the range of locks generated in the Session1.
3, the test database is SQL Server 2014

1, test environment construction

　　1.1 Creating a new test table and writing data

Create TableTestlock (Idint, Namevarchar( -))Create Clustered Indexidx_id ontestlock (ID)Insert  intoTestlockValues(Ten,'AAA')Insert  intoTestlockValues( -,'BBB')Insert  intoTestlockValues( -,'CCC')Insert  intoTestlockValues( +,'DDD')Insert  intoTestlockValues( -,'Eee')

1.2 Data row storage location analysis in a test table

Query the page information of a test table through a system command or table

--View data page informationDBCCInd'Test','Testlock',-1)--or query the system DMVSelect *  fromSys.dm_db_database_page_allocations (db_id('Test'),object_id('Testlock'),NULL,NULL,'detailed')

Table Testlock's data page is 147

1.3 Querying the Keyhashvalue of data rows on page 147th (which can be considered a unique identifier for a data row)

DBCC TRACEON (3604)DBCC PAGE (Test,1,147,3)

Here we find the corresponding keyhashvalue of the data row as shown in

The data row ID and its corresponding keyhashvalue are as follows

10:241332e1ddb0
20:69c872e07e60
30:0bdec3f2b948
40:199f61d4d268
50:0878442f3a75

Initial analysis of the locking range when the 2,range lock is generated

Scene analysis for 2.1 range lock generation

Under Serializable isolation level, test the case where a range lock is generated
As indicated in the comments in the code, the first session executes the following query, and the object is not committed

In the first session, the execution is persisted (not committed or rolled back), another query window is opened, which is the range lock generated by the query in the second session.

It is clear to see that the resource_description that produce two range locks are 0bdec3f2b948 and 199f61d4d268, respectively.
Comparing the data rows analyzed above with the Keyhashvalue, it is shown that the values of the two resource_description are 30 and 40 respectively.
The most important question is here, the range lock resource_description is 0bdec3f2b948 and 199f61d4d268, since is the rangex-x, that is, the range lock, then what is the scope of these two range locks?
Here we give the conclusion that when a range lock of type key is generated,
For example, for the above test case, each range lock corresponds to the following scope (the following table contains critical values, and the thresholds are related to whether the index is unique or not, as explained below)

Taking the above test as an example, there are two rangex-x types of key type locks, respectively, the ID 30 and 40 corresponding to the rangex-x, then the scope of the lock is 20~40,
Since it is a range lock, it has nothing to do with whether the data in the table is interval.
How do you understand the above words?
How to prove the scope of the lock is 20~40, see the following test:

2.2 Query the value of the locked interval, regardless of whether the value already exists in the table, is blocked

The following code is executed at the serialization isolation level in Session2,
The ID value of the query id = 35, although ID = 35 is a nonexistent value, but this interval is locked, as a reasonable query id = 35 of the query will be blocked.
Test as expected, because the interval is locked (exclusive lock), any value of the query interval is blocked, regardless of whether the ID value of the query exists

Continue the test, roll back the query in Session2, query the ID of a lower range,
In the same way, although ID = 25 is a nonexistent value, the interval is locked, and the query id = 25 is blocked by reason.
Also, as expected, because the interval is locked (exclusive lock), any value of the query interval is blocked, regardless of whether the ID value of the query exists

2.3 Query the value of a non-locking interval, regardless of whether the value already exists in the table, is not blocked

It said that the scope of the lock is 20~40, then query a non-interval ID, is not locked.
To continue the test, rollback Session2 query, query an ID = 50 value, within the non-locking range (that is, the ID of this interval is not 20~40), can be queried normally, is also expected.

Continue to roll back the query in Session2, query for an ID value less than 20 and the query succeeds

Continue rolling back the query in Session2, query for an ID value less than 20 and does not exist, where id = 15 is used, and the query succeeds

　　The above test can show that a range lock of a key type, all corresponding to a scope, lock when locked is a range, for the lock range of values, whether or not exist, will be blocked, not only the role of locking the existing data rows.

3, non-unique index case, Range lock lock range analysis

So, how big is the range lock of a key type?
This is also a very interesting question, here is the same conclusion, divided into the following situations:

　　3.1 If the value of the locked target ID exists in the table and is greater than the maximum value in the table, then the locked interval is less than the first maximum value of the lock target, which is greater than the first minimum value of the lock target.

The above tests have shown the range of the lock
such as the target value of the test lock above, in the Session1 to lock the id = 30, the resulting range lock, the range of the lock is the lower value is 20 (less than the maximum value of 30), the upper value is 40 (the minimum value greater than 30)
The text said a bit around, drawing a diagram looks intuitive, as follows
The target of the lock is 30, because a range lock is generated when the lock is 30, and the range lock lock is 20~40

　　3.2 If the value of the locked target ID does not exist with the table, and is greater than the maximum value in the table, less than the minimum value in the table, the locked interval is less than the first maximum value of the lock target, which is greater than the first minimum value of the lock target.

Restart Test, Session1 and Session2 before rolling back test
Execute an ID = 35 query in Session1, this query is executed by adding an exclusive lock, this ID is not present.

Observing the resulting lock in the Session2, a resource_description is found to be a 199f61d4d268 range lock.

The ID of Keyhashvalue for 199f61d4d268 is 40, combined with the above list, 40 the range of locks corresponding to this ID is 30~40

So whether the scope of the lock is not 30~40, can also be used in the Session2 shared lock query Way to detect the locked range in the Session1
Test 1, the value of query id = 31, is locked

Test 2, query the value of id=39, be locked

Test 3, Query id = 29 value, located outside the lock interval, the query succeeds, although this is a nonexistent value, but outside the lock interval, you can query the success.

Test 4, Query id = 50 value, located outside the lock interval, the query succeeds, this is an existing ID value

When the target of a lock does not exist in the table, and the lock target is greater than the minimum ID value already in the table, less than the maximum ID value,
Then the locked interval is less than the first maximum value of the lock target, which is greater than the first minimum value of the lock target.
Similarly, when a range lock is generated, the lock is an interval, regardless of whether the interval has a value or how many values exist.
It's also shown in a graph to make it look more intuitive.

3.3 If the value of the locked target ID does not exist with the table, and is greater than the maximum value in the table, the locked range is a range of maximum values in a table to infinity

Restart Test, Session1 and Session2 before rolling back test
Execute an ID = 60 query in Session1, this query is executed by adding an exclusive lock, this ID does not exist.

Observe the resulting range lock in Session2, this time discovering resource_description is a (FFFFFFFFFFFF), can be considered (FFFFFFFFFFFF) This keyhashvalue is an infinity value

So the problem is, the upper limit of the lock range is an infinity value, so where is the lower bound?
Similarly, a shared lock probe can be used in Session2 to observe the range of Session1 locks
Test 1, query the value of id = 70 in Session2, id = 70 is greater than one of the maximum values in the table, is locked (the maximum lock range is infinite, the same value can be locked)

Test 1, Query id = 50 value in Session2, id = 50 is a maximum value in the table, locked

Test 3, query the value of id = 49 in Session2, id = 49 is less than a maximum value in the table, is not locked, although this value does not exist

Test 4, query the value of id = 40 in Session2, id = 40 is less than one of the maximum values in the table, and the existing value is not locked

When the target of a lock does not exist in the table, and the target is greater than the maximum ID value already in the table, the locked interval is a range from the largest value in the table to the infinity.
It's also shown in a graph to make it look more intuitive.

4, about whether the index is unique to the lock period of the relationship between the threshold value

In the above test process, the given key and its corresponding range lock lock relationship is as follows, the lock range is the threshold (double closed interval), but has not deliberately tested the threshold.

The critical value is not deliberately tested because the threshold is locked, is related to the uniqueness of the index, if the index is not unique, the corresponding range lock is locked at the time of the lock contains a threshold value, if the index is unique, the situation is not the same.
This is explained later in this article.

For a unique index, the following scenarios are divided:

4.1 Unique index case, the lock target is an existing ID value, and the ID value is greater than the minimum ID in the table, less than the maximum ID in the table

When the index is unique, the lock target is an ID value that already exists in a table, so is it a range lock?
Many people think that if the target is a unique index that already exists, there is no concept of "range lock" when there is no range lock, but it is not.
Continue the test, roll back the Session1,session2, delete the non-unique index created at the beginning of the table, and create a unique clustered index on the ID.

Test the index page of the observation data, changed (rebuilt the clustered index, the data page has changed, think about why?). ）

The corresponding relationship between the keyhashvalue of the data and the data row in the same way is as follows

10:d08358b1108f
20:286fc18d83ea
30:8034b699f2c9
40:d8b6f3f4a521
50:f84b73ce9e8d

　Similarly, query an existing ID value in Session1 as the target of the lock

Observing the resulting lock in Session2, the locked line is obviously a data row of id = 30, but it is an X lock, not a range lock (rangex-x).

So at this point, will it just lock the current line?

Test 1, in the Session2 query a value less than the output target (but greater than 20, because 20 is less than the maximum value of the lock target already exists), the discovery is still locked,

Test 2, then measure the value of an ID =29, the same is locked

Here, take a look at Session2 (sess_id = 55) blocked by Session1 (session_id = 54)
What's the lock for lck_m_rs_s,lck_m_rs_s here, Wait_type? lck_m_rs_s: Wait for a shared lock on the current key value and a shared range lock between the current key and the previous key
is still a "shared range lock between the current and previous keys" Ah, is still a range of locks ah, so that the lock already exists with a unique index in the table, although there is no transition out of range lock (sys.dm_tran_locks), but is still essentially a range lock.

Test 3, test for a less than locked target, and exist with the maximum value in the table (that is, 20), the discovery is not locked (this is the unique index and non-unique index on the threshold of the lock difference, if the non-unique index, the 20 threshold will be locked)

Test 4, test a value greater than the lock interval, that is, the following id = 31, the query is successful, even if the id= 31 does not exist.

It can be found that, in the case of a unique index, if the value of the locked target ID exists with the table and is greater than the maximum value in the table, less than the minimum value in the table, then the locked interval is the current value to the first maximum value less than the lock target

4.2 Unique index scenario, the lock target is an ID value that does not exist, and the ID value is greater than the minimum ID in the table, less than the maximum ID in the table

　　This is not the case, the conclusion is like a non-unique index, such as the lock target ID = 35, the scope of the lock is (30,40], that is, the Zokai (interval) right-closed (interval)

4.3 Unique index scenario, the lock target is an ID value that does not exist, and the ID value is greater than the maximum ID in the table

This is not the case, the conclusion is like a non-unique index, such as the lock target ID = 60, the scope of the lock is (50,+∞), that is, Zokai (interval)

5, the condition that the query condition is an interval value

Because you know the interval of the lock for a single value query, the case of a range query is simply to decompose the query scope, break out the scope of a single value lock, and then merge the interval to get the union of an interval.
Interested can self-test.

6, on the condition that the query condition is a nonclustered index

　　All of the above are tested with a clustered index as the query condition, and if the nonclustered index is the same, it is just a non-clustered index-level lock that has time to test again.

Summarize:

The serialization isolation level prevents the generation of Phantom reads, and the generation of Phantom reads is achieved through a range lock lock.
The main thing about a range lock is to lock a range, not just the data already in the table, but an interval, regardless of whether or not there is data within the range,
Any session attempting to manipulate data locked by another session range lock, regardless of whether it exists in the table, will be blocked, knowing that the session that generated the range lock is committed.

At this point it is not difficult to understand, for the most classic problem: concurrency, there is an update, there is no insertion, you can use Xlock+hold (equivalent to serializable isolation level) behind the implementation of the principle.

The above is purely personal testing and simple inference, inevitably there is the wrong place, if interested, welcome to discuss, thank you.

At last
In fact, the landlord is to see the gap lock MySQL, Next-key lock back to look at the range lock in SQL Server,
Finally found that, in addition to some details, the implementation of the lock is the same on the routine, such as the treatment of Phantom reading, can be described in the "Tao" level is a principle.
A range of lock, called the Gap Lock, Next-key lock, the different forms of expression is only the "technique" on the problem.

Too tired, the eyes of the neck can not be bird.

reference materials, a variety of books, various online search.

Analysis of the locking problem of SQL Server to prevent the range lock of Phantom read under Serializable isolation level

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More