Transaction and transaction isolation levels in SQL Server and how to understand the process and causes of dirty reads, uncommitted reads, non-repeatable reads, and Phantom reads

Last Update:2016-02-12 Source: Internet

Author: User

Tags try catch ticket ssis

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Originally intended to write about the transaction control process in the SSIS package, it was found that many of the basic concepts needed to be based on the isolation level of SQL Server transactions and transactions. So it took some time to get a good look at the concepts of transactions in SQL Server database, the ACID principle, the common problems in transactions, the causes of problems, and the level of transaction isolation.

In fact, about the transaction in SQL Server, to tell the truth because of too much content, the topic is too broad, a little force control is not good beyond my current knowledge capacity, it is not few words can speak clearly. Therefore, I hope that we can point out the shortcomings of the summary, and for me, more opportunities for improvement can help us deepen our understanding of the matter.

The knowledge points covered in this article are:

The concept of transactions in a SQL Server database
The ACID principle (adding a part of the content specifically to explain atomicity, referring to the display of transactions and the xact_abort mechanism to ensure the atomicity of transactions)
Lists common problems and causes in transactions: dirty reads, uncommitted reads, non-repeatable reads, Phantom reads , etc.
The isolation level of transactions in SQL Server and how they do avoid dirty reads, uncommitted reads, non-repeatable reads, and Phantom reads (which describe these issues in code and use the chronological order to explain the causes)

The concept of transactions in a SQL Server database

The transaction in the database is the basic unit of the concurrency control of the database, one or a set of statements are all successful, and some data in the database is successfully modified; Either all are unsuccessful, the data in the database is restored to the execution of these statements

The way it was before. For example, the online booking of train tickets, or you have to make a successful vote, the remaining votes to reduce one; Or you will fail to get the cancellation ticket, the number of the remaining tickets is still so many. Do not allow you to make a booking successfully, the remaining votes did not decrease or you cancel the booking, the rest of the ticket shows that there is a missing one of the situation. This is not allowed to require the purchase of tickets and the reduction of the remaining two different operations must be put together to become a complete logical chain, which constitutes a transaction.

The ACID principle of transactions in a database

atomicity (atomicity): The atomicity of a transaction refers to a statement contained in a transaction or multiple statements constituting a complete logical unit, which has no further atomicity. This logical unit either commits the execution all together, or commits all failures together.

Consistency (consistency): can be understood as data integrity, transaction commits to ensure that operations on the database do not break the integrity of the data, for example, do not violate some of the constraints of data insertion or modification behavior. Once the integrity of the data is compromised, SQL Server rolls back the transaction to ensure that the data in the database is consistent.

Isolation (Isolation): related to the transaction isolation level and locks in the database, multiple users can concurrently access the same data without disrupting the correctness and integrity of the data. However, the modification of the parallel transaction must be isolated from the modifications of other parallel transactions. However, under different isolation levels, the read operation of a transaction may result in a different outcome.

Persistence (Durability): data Persistence, once the operation of the data is completed and committed, the data modification is completed, even if the service restarts the data will not change. Conversely, if the system service crashes or restarts during the execution of the transaction, the operations of the firm are rolled back, that is, the state before the transaction operation.

I understand that in the event of an extreme power outage or a system crash, a database should record the "ID" of the transaction and some data that has been updated on the database before the transaction is committed. After the power recovery database restarts, all undo and rollback operations are completed. If a power outage occurs after a transaction is committed, it is possible that the result of the change is not normally written to disk persistence, but it is possible that the lost data will be automatically recovered and regenerated through the transaction log to write to disk completion persistence.

Further understanding of atomic nature

Regarding atomicity, it is necessary to add more here, because the concept we describe refers to atomicity in transactions. One SQL statement and multiple SQL statements have some differences in handling atomicity, and these differences are demonstrated below.

Run the code first and create a very simple test table that simply simulates the ID of an account and the balance of the account.

Use biwork_ssisgoif object_id (' dbo. Account ') was not nulldrop TABLE dbo. Accountgocreate TABLE dbo. Account (  ID int. PRIMARY KEY,  accountbalance money CHECK (accountbalance >= 0))

Atomicity of a single SQL statement

Insert a test statement, and then query the results.

This refers to autocommit transactions, where T-SQL defaults to transactional mode, which is a process that automates and automatically rolls back transactions. In addition to automatically committing transactions, SQL Server also shows transactional and implicit transactions, and temporarily does not discuss their differences in this article.

In the above two autocommit transactions, each autocommit transaction contains only one SQL statement, no more points, either succeeds or fails.

For example, when inserting multiple data into a single SQL statement, one of the data is in compliance with the constraint. But because another piece of data violates the check constraint, it also causes the entire INSERT statement to fail, so no data can be inserted into the data table.

The atomic nature of a whole formed by multiple SQL statements

Suppose that the following two Insert statements form a logical unit with atomic characteristics, which is a whole that needs to form a transaction, and what should be done about it.

INSERT into dbo. Account VALUES (1004,-1) inserts into dbo. Account VALUES (1005,500)

Obviously if this is done directly, 1004 Insert fails, 1005 can be inserted successfully, so that is two different transactions. SQL Server provides two ways to ensure that the logical block that contains multiple sets of SQL statements is atomic in nature.

Method One-use the display transaction to combine multiple SQL statements to form a whole to achieve the atomicity of a transaction

The first is the very common display of transactions, through the use of the BEGIN TRANSACTION, commit TRANSACTION, and the ROLLBACK TRANSACTION command to commit a set of SQL statements into a complete transaction, commit either successfully, or fail.

--Start a transaction begin transaction--try CATCH statement begin try-this one violates the check constraint and inserts the failed insert into    dbo. Account VALUES (1004,-1)-this one will be inserted successfully, but at this point the transaction has not actually committed    insert INTO dbo. Account VALUES (1005,500) END trybegin CATCH-error occurs, transaction rollback    IF @ @TRANCOUNT > 0        ROLLBACK TRANSACTION; END catch;--does not enter the CATCH block, commits the transaction if @ @TRANCOUNT > 0    commit TRANSACTION; GO

Of course the end result is that the transaction is rolled back, and none of the data is inserted into the data table, so all failures fail at the same time, ensuring the atomicity of the transaction.

Mode two-ensure the atomicity of the transaction by setting Xact_abort to On

Let's look at the default setting, when Xact_abort is OFF.

--Set Xact_abort OFF-default SQL Server settings set Xact_abort offbegin TRANSACTION-This article violates the check constraint and inserts the failed insert into    dbo. Account VALUES (1004,-1)-This article inserts a successful insert INTO dbo. Account VALUES (1005,500) COMMIT TRANSACTION

When Xact_abort is the off state, which is the SQL Server default, in the above transaction, SQL Server typically rolls back only the failed statements, which means that only 1004 of the data is rolled back, and 1005 is inserted successfully. It is clear that this violates the atomicity of the transaction, because we do not show the writing to ROLLBACK TRANSACTION.

Ok! Then we set the Xact_abort to on, then we tell the transaction behind it, and if we encounter an error, we terminate the transaction and roll back immediately. This does not pass the displayed ROLLBACK TRANSACTION also ensures the atomicity of the transaction.

In the above example, only transaction 2 commits successfully, and transactions 1 and 3 are rolled back, and the insert operation failed to execute.

Note that a go keyword is added to each transaction above, and if you do not add the Go keyword, executing these SQL statements together will cause transactions 2 and 3 to fail because the execution of transaction 1 fails, and the GO keyword forms a batch that represents the previous set of SQL statements to be processed together.

The Go keyword is very interesting, and you can add a number of times after go to indicate the previous one or a group of SQL execution several times.

From the above example, it should be possible to understand the relationship between atomicity and transaction, and how to implement the atomicity of a transaction.

Issues that are common in transactions

After you understand the principle of ACID for a transaction, take a look at some of the things that you might encounter when using transactions with multiuser concurrency in SQL Server:

Dirty Read (Dirty Reads): one transaction is accessing and modifying data in the database but not committed, but another transaction may read the data that has been modified but not committed. This could result in the possibility that all operations could be rolled back, such as the first transaction's modification of the data might violate some constraints on the data table and break the integrity, but it happens that the second transaction is reading the incorrect data causing its own operation to fail back.

non-repeatable read (non-repeatable Reads): A transaction reads the same data two times, and the B transaction reads the same data, but the a transaction has updated this data before the second read of the B transaction. So for a transaction, it may be inconsistent for the first and second readings of this data.

Phantom Reading (Phantom Reads): a bit like non-repeatable reading, is two reads, the difference is the first operation of a transaction, for example, a full table of data, when the B transaction is not only to modify a specific data, but insert a new data, and a The second time the transaction reads the whole table, it finds that there is more data than the last one, and hallucinations.

Update lost (Lost update): Two transactions are updated at the same time, but because a rollback operation occurs because of a transaction update failure, it is possible that the second transaction has updated data because the first transaction has been rolled back and the data has not finally been updated. As a result, updates to two transactions fail.

Isolation level of transactions in SQL Server and relationships with dirty reads, non-repeatable reads, Phantom reads, etc. (code justification and chronological order)

Knowing that these problems can occur with concurrent access to the database, you can continue to understand a concept such as the database isolation level, in a more general sense: How do you want to isolate the concurrent transactions and to what extent? For example, the ability to tolerate dirty reads, or to avoid dirty reads of concurrent transactions, can make the isolation between concurrent transactions looser or more severe by setting the isolation level.

The higher the isolation level, the fewer opportunities to read dirty data or create incomplete data, but in highly concurrent systems, the more severe the performance degradation is. The lower the isolation level, the higher the concurrency of system neutrality, but the data itself may not be complete.

In SQL Server 2012, you can set the isolation level (from low to high) for transactions by using this syntax:

SET TRANSACTION Isolation level    {READ Uncommitted    | READ COMMITTED    | Repeatable READ    | SNAPSHOT    | SERIALIZABLE    }[;]

The following code example demonstrates the performance of each transaction isolation level by running the following SQL statement and inserting a test statement.

TRUNCATE TABLE BIWORK_SSIS.dbo.AccountGOINSERT to BIWORK_SSIS.dbo.Account VALUES (1001,1000) SELECT * from Biwork_ SSIS.dbo.AccountGO

READ UNCOMMITTED (UNCOMMITTED)

The lowest isolation level, the problem is the dirty read, because you can read other transactions modified but not committed data. It works the same as setting (NOLOCK) on the SELECT Statement object table in a transaction.

Open two query Windows, the first window represents transaction A, and the second window represents transaction B. Transaction a maintains the default isolation level, transaction B sets their isolation level to read uncommitted, and can view the changed results through DBCC useropitions.

SET TRANSACTION Isolation Level READ UNCOMMITTED DBCC useroptions

Test steps:

Execute the SQL code for transaction A first

BEGIN transactionupdate BIWORK_SSIS.dbo.AccountSET accountbalance = WHERE ID  = 1001WAITFOR DELAY ' 00:00:10 ' ROLLBACK Transactionselect * from BIWORK_SSIS.dbo.Account WHERE ID = 1001

Then execute the SQL code for transaction B immediately.

--The 1th query occurs before a transaction is committed or rolled back. Select * from BIWORK_SSIS.dbo.Account where ID = 1001WAITFOR DELAY ' 00:00:10 '--The 2nd query occurs in a transaction rollback After select * from BIWORK_SSIS.dbo.Account WHERE ID = 1001

As you can see, transaction B reads this data for ID = 1001 two times, but it is clear that the first read is dirty data. The following simulation of the timing of their occurrence, although not rigorous, but can help to understand the cause of dirty reading.

You can also change the isolation level of transaction B back to the default READ COMMITTED and run the query with NOLOCK immediately after transaction A is run, and the effect is consistent with the one described above. Once you add NOLOCK, you can think of it as equivalent to the isolation level READ uncommitted.

SELECT * from BIWORK_SSIS.dbo.Account with (NOLOCK) WHERE ID = 1001

Read Committed (Submitted)

This is the default setting for SQL Server, read-committed, can avoid dirty reads, and can meet most requirements. A statement in a transaction cannot read data that has been modified by another transaction but has not yet been committed, but is able to read data that has been modified and committed by other transactions. That is, it is possible to have non-repeatable Reads non-repeatable reads and Phantom Reads Phantom reads, because the current transaction may occur two reads of the same resource, but between two read processes, Another transaction might have completed the behavior of reading updates and committing to this resource, so the data might be inconsistent before and after. Therefore, this default isolation level resolves dirty reads but does not solve non-repeatable Reads non-repeatable reads.

Let's take a look at the previous example to see if the isolation level is set to read COMMITTED, can you avoid dirty reads? Run transaction A First, and then run transaction B.

Because Read committed cannot read data that has been modified by other things but has not yet been committed, transaction B must wait for transaction A to finish modifying the data to commit or rollback before it can begin reading. Running transaction A and transaction B, it is obvious that transaction B has a process to wait for transaction A to commit or rollback, and to look at their timing diagram.

From this you can see that the isolation level read COMMITTED can avoid dirty reads, but there are other problems that may arise, see this example. Execute transaction a first, then execute transaction B directly.

From the above execution results, it is obvious that in transaction A, the value of ID = 1001 in the same transaction is inconsistent. Suppose this is not a simple query, but the first query account balance has 1000 yuan, and then the action is to take 1000 yuan, it is obvious that the second time to find only 500 yuan. The reason is that in the first query and take the gap between the transaction B drill, modified the balance. This is the non-repeatable read described above, see the timing diagram below.

So read COMMITTED is committed to reading the isolation level to avoid dirty reads, but still encounter non-repeatable read issues.

REPEATABLE READ (Repeatable Read)

You cannot read a row that has been modified by another transaction but not committed, and no other transaction can modify the data that is read by the current transaction before the current transaction completes. However, for the new row data that is inserted by other transactions, the current transaction retrieves the row the second time it accesses the table row. Therefore, this isolation level setting solves the problem of non-repeatable Reads non-repeatable reads, but avoids Phantom Reads Phantom reads.

Then the above example makes some changes, add some queries, remember to change the ID = 1001 Balance back to 1000. Set the isolation level of transaction A to repeatable read REPEATABLE read level to see how this isolation level behaves.

Although in the final query result, the balance of ID = 1001 is 500, but the two reads in transaction A occur before transaction B begins, one time after transaction B commits, but the balances they read are consistent and do not see the modification of this value by transaction B.

As can be seen from the sequence diagram above, transaction A's first read to the ID = 1001 value of the balance and the second read to the same, it can be understood that transaction a in the query period is not allowed to modify the value of transaction B. Because transaction a does not see this change, transaction A does think that transaction B listens to it and does not make an Update. In practice, however, transaction B has completed this operation, but the results of two reads are always consistent, because the isolation level in transaction A is set to repeatable read REPEATABLE read.

So the example here is that transaction B is modifying the data, and if it is adding a new row of records?

Business A is starting to faint again! Actually two times the result of the query is not the same, the second query more than one data, this is the magic read!

SNAPSHOT (Snapshot Isolation)

Can solve the problem of Phantom read Phantom Reads, the data version of the current transaction is consistent between the beginning of the transaction and the end of the transaction submission. Other transactions may make changes to the data, but it is not visible to the current transaction. A bit like when the current transaction gets this data is a snapshot of the data, so the operations on this snapshot are based on the same data version before and after several operations in the same transaction. Therefore, the setting of this isolation level resolves the Phantom Reads Phantom read problem. However, it is important to note that other transactions can modify the data read by the current transaction before the current transaction completes.

Before using SNAPSHOT, be aware that the database does not allow the SNAPSHOT isolation level to be set by default, and the direct settings will appear similar to this error:

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

MSG 3952, Level A, State 1, line 8

Snapshot isolation transaction failed accessing database ' biwork_ssis ' because Snapshot isolation is not allowed in this D Atabase. Use the ALTER DATABASE to allow snapshot isolation.

So use the SET command to turn on this support

ALTER DATABASE Biwork_ssisset allow_snapshot_isolation on

and clear the other IDs before you start, leaving only the record with id = 1001.

DELETE from BIWORK_SSIS.dbo.AccountWHERE ID <> 1001

This solves the problem of phantom reading by setting the isolation level, which ensures that the data row version queried in transaction A is consistent and snapshot.

But did you find out? Whether using repeatable Read or Snapshot in transaction a still inevitably prevents transaction b from modifying the shared resource, although the modification is not discovered by transaction A, the data in transaction a remains consistent, but it is actually modified. As soon as transaction a commits to completion, it is immediately possible to see that the changes made by transaction B have taken effect. In retrospect, if my first enquiry was $1000, the second action might be to fetch $1000. An additional transaction between these two actions has changed the amount, although I have two reads of 1000 yuan, but in fact it is not common sense. Or, I'll check and then withdraw. The action is coherent, and then another transaction changes the amount. Or, the other business to make changes to the amount, such as deduction of 500 yuan, then I will check again the money is consistent. That is, when transaction a makes an operation on a resource, it forms an exclusive, and transaction B does not come in. Or when transaction B is doing operations on this resource, transaction A must wait for transaction B to finish before it can start its transaction, so it will use the strictest isolation level-SERIALIZABLE.

SERIALIZABLE (serialized)

The lowest performance, the highest isolation level and the most stringent, can be almost all of the issues mentioned above. For example, you cannot read other data that has been modified by other transactions but not committed, and does not allow other transactions to modify the data read by the current transaction before the current transaction completes the modification, and does not allow other transactions to insert new rows before the current transaction finishes modifying. It works the same as setting HOLDLOCK on all the tables in all the SELECT statements within a transaction, and can be considered when the concurrency level is relatively low but the security requirements are high. If the concurrency level is high, the performance bottleneck will be very severe with this isolation level.

Adjust the isolation level of transaction A to SERIALIZABLE, then execute a and then execute B.

As you can see here, the execution of transaction B basically begins after transaction A is committed, and when transaction A is executed, transaction B is blocked there until transaction a commits because it also accesses the resource. This is not to say that transaction B does not start, but that it is in a wait state when executing a SELECT query because transaction A consumes the resource.

Setting the isolation level in SQL Server Note that you can only set an isolation level option at a time, and that the isolation level you set is valid until you explicitly modify the current connection. All read operations performed in a transaction are also run under the specified isolation level rule, unless other locks or versioning behaviors are specified on the table in the SELECT operation statement.

Note: The sequence diagram above is only used to help understand the isolation level of a transaction, and it is only a general order of execution, and of course it is related to the point in time when I perform transaction A and transaction B, so it does not really reflect the actual order in which the SQL statements are submitted and executed in the actual process, which can be submitted through SQL Profiler Go and follow the look.

Transaction and transaction isolation levels in SQL Server and how to understand the process and causes of dirty reads, uncommitted reads, non-repeatable reads, and Phantom reads

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More