SQL query optimization, NOLOCK issues

Last Update:2014-11-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In many of the projects that have been done, we find that no matter what the table, every select must add Nolock (or with (NOLOCK)), it seems to have been institutionalized a thing. The leading high man explained that adding nolock can improve the query speed without affecting other concurrent operations on the data table.
But is it really necessary to add nolock to each query? Personally, we think that adding nolock or not is worth considering the actual situation (at least we need to know it and then know why). Here is a simple analysis of the addition of Nolock and added nolock to the actual query some of the impact.
I. Important Concepts
(Meditate here for 5 seconds, and quietly recall some of the concepts of reuse in classic database textbooks.) Well...... What, you don't think about it all? Well, don't bother, the reason is to speak, the book is to be consulted (BS Direct transcription))
Concurrent access: Multiple users have access to the same resource at the same time. If another user in the concurrent user modifies the resource at the same time, access to the same data will result in a "not seen" situation, which may adversely affect other users, including:
1: Dirty read: There is a user to a certain resource has been modified, at this time another user just read the modified record, and then the first user gave up the modification, the data restored to the modification, these two different results are dirty read.

2: Phantom read: In particular, the user read a batch of records. Users two times query the same condition of a batch of records, the first query, there are other users of this batch of data has been modified, the method may be insert,update or delete, the second query, the user will find the first query of the record entries are not in the second query results, Or the entry of the second query is not in the contents of the first query, causing inconsistencies in the results of the query.
3: Non-repeatable READ: One operation of a user in the system is a transaction, and this transaction reads the same record two times. If, after the first read, just another user modifies the record, and the second reads exactly the data of the user who made the previous modification, it is possible to make the data read two times different. Of course, if we lock this record in a transaction, we can avoid it.

Second, how to eliminate the adverse impact of concurrent access
As mentioned earlier, since concurrent access can have so many adverse effects, how can we solve it? It is estimated that the average programmer's subconscious response is to lock, lock, over, as we do when we control multi-threaded concurrent programming. That's right, I can't say you're wrong! What a clever and happy programmer!
In fact, in MS SQL Server, there are two kinds of control mechanism of concurrent access: Lock and row versioning, about concurrency control, MS Description and solution is really detailed and thoughtful. You have to pf how our MS is so pro mom Ah, really everything to help us think and do well.
Analyze the lock of the database first.
A small copy of a reference book:

1. Lock: "Each transaction depends on a resource that requests a different type of lock, which prevents other transactions from modifying resources in such a way that a transaction request lock error may occur." When a transaction no longer relies on a locked resource, the lock is freed. From the point of view of the database system: we can divide the locks into shared locks, exclusive locks (locks) and update locks:
(1), Share (S): For operations that do not change or not update data (read-only operations), such as our common SELECT statement.
(2), Update (U): for updatable resources. Prevents common forms of deadlocks that occur when multiple sessions are read, locked, and subsequent resource updates are possible.
(3), Exclusive (X): For data modification operations, such as INSERT, UPDATE, or delete. Make sure that you do not make multiple updates to the same resource at the same time.

For something as simple as it is, of course, you can't allow a large group of programmers to set up or control them. SQL Server automatically manages the setting and control of locks by setting the isolation level of the transaction. The lock manager parses the SQL statements to be executed through the query parser to determine which resources these SQL statements will access, what to do, and then automatically assigns the locks to be used in conjunction with the Set isolation level.

Next, let's look at row versioning.
2, row version control:
Do you want to use it? Small copy:
(1), Introduction
Row versioning Isolation is a new isolation framework for SQL Server 20,051. Using row versioning can significantly reduce the amount of concurrency in a large number of concurrent situations, and it can significantly reduce the occurrence of dirty reads, Phantoms, lost updates, and so on (read_committed_snapshot) compared to nolock. When a transaction that is running under row versioning-based isolation reads data, the read operation does not acquire a shared lock (S-Lock) on the data being read, and therefore does not block transactions that are modifying the data. Additionally, the cost of locking resources is minimized as the number of locks acquired is minimized. Read-committed isolation and snapshot isolation using row versioning can provide statement-level or transactional-level read consistency for replica data. "
(2), principle
"The principle of row versioning in SQL Server 2005 is simple (not to say, I note), which is that a class timestamp column (row version column) is silently added to the record on each row in the library table. When using row versioning for isolation, SQL Server 2005 Database Engine assigns a transaction sequence number (XSN) to each transaction that uses row versioning to manipulate data. The transaction is started when the BEGIN TRANSACTION statement is executed. However, the transaction sequence number begins to increase when the first read/write operation after the BEGIN TRANSACTION statement is executed. The transaction sequence number is incremented by 1 per allocation. When the transaction executes, SQL Server provides the corresponding version of the row based on the row version column. SQL Server maintains a logical copy (version) of all data modifications that are performed in the database. Every time a particular transaction modifies a row, the database Engine instance stores the image version of the previously committed row in tempdb. Each version is marked with the transaction sequence number of the transaction that made the change. The version of the modified row is linked together using the link list. The most recent row values are always stored in the current database and linked to the version stored in tempdb in the version store. (When you modify a large object (LOB), only the changed fragments are copied to the version store in tempdb, and for short-running transactions, the modified row version may be saved in the buffer pool and not written to the disk file in the tempdb database.) If you only need the VP temporarily, it will simply be removed from the buffer pool without raising I/O overhead. ）”
(3), Advantage
The isolation level using row versioning has the following advantages:
A, read operations to retrieve a consistent database snapshot;
b, the SELECT statement does not lock the data during the read operation (the reader does not block the writer, the writer does not block the reader);
The C, select statement can access the last committed row value while the other transaction is updating rows without blocking the application;
D, the number of deadlocks decreased;
E, the number of locks required by the firm is reduced, which reduces the system overhead required to manage locks;
F, the number of lock escalation decreased.
(4), row version control summary:
When row versioning-based isolation levels are enabled, the database engine maintains the version of each row that is modified. Instead of using locks to protect all reads, applications can specify that transactions use row versions to view the data that exists at the beginning of a transaction or query. By using row versioning, the likelihood of a read operation blocking other transactions is greatly reduced, which is equivalent to adding nolock to all tables at query time. Although the same can produce dirty read phenomenon, but the difference is that we do not have to add nolock every query, the row version control policy by default one of the settings to help us get it done.
BTW, now that you've talked about row versioning-based isolation levels, you have to say the isolation level. Isolation level, how to say? Don't you laugh, transcription ing:
<1>, usefulness: The application of control lock, that is, what kind of lock mechanism is applied to solve the problems caused by concurrent processing;
<2>, classification:

A, uncommitted read (uncommitted read): Pessimistic, equivalent (NOLOCK; the lowest level of isolation transaction, only guaranteed not to read physically corrupted data.
B. Read Committed: Pessimistic, default mode of the database engine, read operation shared lock time until read end.
C, Repeatable read (REPEATABLE Read): Pessimistic, read operation shared lock time is longer than committed read mode, until the end of the transaction.
D, Serializable (Serializable): Pessimistic, equivalent (HoldLock), the most rigorous.
E, read Committed Snapshot: Optimistic, 2005 new, based on row versioning, all read operations are not affected by other locks, historical data save shorter, less temp space, support distributed.
Alter database name Set read_committed_snapshot on
F, Snapshot (Snapshot): Optimistic, 2005 new, based on row versioning, all read operations are not affected by other locks, historical data save longer, more temp space, do not support distributed.
Alter database name Set allow_snapshot_isolation on

<3> View current isolation mode and row version control status (2005)

DBCC useroptions
Select name, Snapshot_isolation_state, Snapshot_isolation_state_desc, is_read_committed_snapshot_on from sys.databases

3. Summary
According to the previous analysis, we know that SQL Server 2005 has two effective ways of controlling concurrent access; Nolock statement execution does not emit a shared lock, allowing dirty reads, equal to the READ UNCOMMITTED transaction isolation level, in this sense, NOLOCK does improve speed when querying. But now we have to ask ourselves, nolock need to add, do not need to add it? Really need to add, really do not need to add it?? Can you give me a little more certainty?

Three, nolock of the applicable scene (the following points are entirely personal opinion, can be 54.)
1, "persisted" table: that is, the data does not frequently change the table, such as we know the province, city, county and airlines, airports and so on. Their common feature is that, at least for the moment, the data will not change for a long time. In fact, in the long run, even a very mature company's departmental table can be treated as such data, but the department has a relationship with the staff table can not;
2, some business logic that allows dirty reads: This is nothing to say, customer demand determines that you are not on this "serious". For example, if we want to query a business unit for a quarter or a year of performance statistics, you need to know about the situation. In this case, it doesn't matter how many times the query nolock.
3. A table with huge amounts of data stored: there is no doubt that the larger the amount of data, the greater the importance, the more access, the greater the number of concurrent operations affecting records, the so-called "tall tree catches", however. The fact that we add nolock to queries can greatly improve performance and the user experience, which, of course, increases performance at the expense of data consistency and security.
Finally, through the above analysis, we conclude that the query (especially the massive data) without locking, no doubt, the speed is indeed improved, but we should selectively choose the most suitable table to use NOLOCK. Because we already know that "concurrency on data tables" is likely to cause some query results, such as the "dirty reads" that we are familiar with. Imagine that there are some queries that are not expected (the so-called "expected query", that is, the user believes that the query results inconsistent is also reasonable, such as orders in order to change the status of orders in order to lead to inconsistent results and so on), because "dirty read" caused by "dirty data" inconsistent query results before and after Two times at a time, may be used by people think they are dazzled or how. However, if multiple or large data data does not match, it will certainly let the user confusion, the psychological quality of good will habitually put the problem to the system, the quality of the poor in the mind of their own mistaken operation or how, directly caused panic or even suspected of their RP.

Main example: SELECT * from t_bos220000001 with (nolock keyword with and no keyword with) The NOLOCK keyword follows the table name, select * FROM t_bos2200000 (NOLOCK) This is also possible or Update table (NOLOCK) Set a= "where id=1

This article was reproduced from: http://youzhangcai.blog.163.com/blog/static/1668481842010111782534757/

SQL query optimization, NOLOCK issues

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL query optimization, NOLOCK issues

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL query optimization, NOLOCK issues

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support