1. Concurrency consistency
Common concurrency consistency problems include: Lost modifications, non-repeated reads, dirty reads, and Phantom reads (Phantom reads are often classified as non-repeated reads in some materials ).
1.1 lost Modification
Next, let's take an example to illustrate the data inconsistency caused by concurrent operations.
Consider an activity sequence in the aircraft booking system:
- A ticket point (transaction a) reads the balance of a flight's ticket A and sets a to 16.
- Ticket B (transaction B) reads the balance A of the same flight, which is also 16.
- A ticket points sell a ticket, modify the balance of a small A-1. So a for 15, write a back to the database.
- B ticket points also sell a ticket, modify the balance of a small A-1. So a is 15, write a back to the database.
As a result, the balance of the two tickets is reduced by 1.
To sum up, two transactions t1 and t2 read and modify the same data. The results submitted by T2 destroy the results submitted by T1, resulting in loss of modifications to T1. The problems and solutions mentioned in the previous article (data deletion and update in 2.1.4) are often aimed at such concurrent problems. But there are still several problems that cannot be solved through the above method, that is:
1.2 non-repeated read
Non-repeated read means that after transaction T1 reads data, transaction T2 performs an update operation, so that T1 cannot reproduce the previous read results. Specifically, unrepeatable reading involves three situations:
- After transaction T1 reads a data, transaction T2 modifies it. When transaction 1 reads the data again, it gets a different value from the previous one. For example, T1 reads B = 100 for calculation, T2 reads the same data B, modifies it, and writes B = 200 back to the database. T1 re-reads B for the read value, and B is already 200, which is inconsistent with the first read value.
- After transaction T1 reads some data records from the database based on certain conditions, transaction T2 deletes some of the records. When T1 reads data based on the same conditions again, some records are found to have vanished.
- After transaction T1 reads some data records from the database based on certain conditions, transaction T2 inserts some records. When T1 reads data again based on the same conditions, it finds that there are more records. (This is also called phantom read)
1.3 read "dirty" Data
Reading "dirty" data means that transaction T1 modifies a data and writes it back to the disk. After transaction T2 reads the same data, transaction T1 is revoked for some reason, at this time, T1 has modified the original data recovery value. The data read by T2 is inconsistent with the data in the database, and the data read by T2 is "dirty", that is, the data is incorrect.
The main reason for the above three types of data inconsistency is that concurrent operations undermine the isolation of transactions. Concurrency Control is to use the correct method to schedule concurrent operations so that the execution of a user's transaction is not affected by other transactions, so as to avoid data inconsistency.
2. Solutions to concurrency consistency problems 2.2.2.1 locking)
Blocking is a very important technology for implementing concurrency control. The so-called blocking means that transaction t sends a request to the system before performing operations on a data object, such as a table or record, to lock it. After the lock, transaction T has some control over the data object. Before transaction t releases its lock, other transactions cannot update the data object.
There are two basic types of lock: exclusive locks (x locks) and shared locks (share locks ).
Exclusive locks are also called write locks. If transaction t adds an X lock to Data Object a, only T is allowed to read and modify a. No other transaction can add any type of lock to, wait until t releases the lock on. This ensures that other transactions cannot read or modify a before t releases the lock on.
A shared lock is also called a read lock. If transaction t adds the S lock to Data Object a, other transactions can only apply the S lock to a, but not the X lock until t releases the S lock on. This ensures that other transactions can read a, but cannot modify a before t releases the S lock on.
2.2.2.2 blocking agreement
When using the X lock and S lock to lock data objects, some rules need to be agreed, for example, when to apply for the X lock or s lock, hold the lock time, and when to release the lock. These rules are called locking protocols ). Different blocking protocols are formed when different rules are specified for the blocking method. The following describes the three-level blocking protocol. The three-level blocking protocol solves the inconsistency issues such as lost modifications, non-repeated reads, and read "dirty" data to varying degrees, and ensures the correct scheduling of concurrent operations. The following section only defines the three-level blocking protocol and does not discuss it too much.
- Level 1 blocking Protocol
Level 1 blocking Protocol: Transaction T must apply an X lock to the data R before it modifies the data until the transaction ends. The transaction end includes the normal end (COMMIT) and the non-normal end (rollback ). The level-1 blocking protocol Prevents Loss of modifications and ensures that the transaction T is recoverable. In the level-1 blocking protocol, if only the read data is not modified, no locks are required. Therefore, it cannot ensure Repeatable read and non-read "dirty" data.
- Level 2 blocking Protocol
Level-2 blocking Protocol: level-1 blocking Protocol plus transaction T must first apply the S lock to the data R before reading it, and then release the S lock after reading it. In addition to preventing loss of modifications, the level-2 blocking protocol can further prevent reading of "dirty" data.
- Level 3 blocking Protocol
Level 3 blocking Protocol: Level 1 blocking Protocol plus transaction T must first apply the S lock to the data R before reading it until the transaction ends. The three-level blocking protocol not only prevents loss of modifications and non-reading of 'dirty' data, but also further prevents non-repeated reads.
2.3 transaction isolation level
Although the database theory provides a sound solution to the concurrency consistency problem, it is obviously difficult for programmers to control how to lock, lock, and unlock the timing. The vast majority of databases and development tools provide transaction isolation levels, allowing users to easily handle concurrency consistency issues. Common transaction isolation levels include readuncommitted, readcommitted, repeatableread, and serializable. The access to the database at different isolation levels and the returned results of the database may be different. We will explore the transaction isolation level through several experiments and how SQL Server converts them into locks in the background.
2.3.1 readuncommitted and readcommitted
Readuncommitted is the lowest isolation level. This level of isolation allows reading dirty data that has not been committed by others. In addition, there is still a problem of non-repeated read at this level of transaction isolation.
Readcommitted is the default level for many databases. At this isolation level, there will be no issues with reading uncommitted data, but it still cannot avoid repeated reads (including Phantom reads. When your system has strict requirements on concurrency control, this default isolation level may not provide effective data protection, but this isolation level is sufficient for most applications.
We use the following experiment for testing:
First, configure the SQL Server 2000 database and add the dbapp database. Create a Management Console Application in Visual Studio. NET and add necessary namespace references:
using System;using System.Data;using System.Data.SqlClient;using System.Configuration;
Then create two database links and adopt different transaction isolation levels:
private static SqlConnection conn1;private static SqlConnection conn2;private static SqlTransaction tx1;private static SqlTransaction tx2;private static void Setup(){conn1 = new SqlConnection(connectionString);conn1.Open();tx1 = conn1.BeginTransaction(IsolationLevel.ReadUncommitted);conn2 = new SqlConnection(connectionString);conn2.Open();tx2 = conn2.BeginTransaction(IsolationLevel.ReadCommitted);}
Here, transaction 1 allows reading uncommitted data, while transaction 2 allows reading only committed data.
In the main program, we simulate different operations of two people to generate concurrency consistency problems:
public static void Main(){Setup();try{ReadUnCommittedDataByTransaction1();UnCommittedUpdateByTransaction2();ReadUnCommittedDataByTransaction1();tx2.Rollback();Console.WriteLine("\n-- Transaction 2 rollbacked!\n");ReadUnCommittedDataByTransaction1();tx1.Rollback();}catch{……}}
Step 1: Use the readuncommitteddatabytransaction1 method to read the student information with ID 1 from the database. At this time, the information is the initial information of the database.
Step 2: Call the uncommittedupdatebytransaction2 method and send an update command from the first transaction to update the database, but it has not been submitted yet.
Step 3: Call readuncommitteddatabytransaction1 again to read database data from transaction 1. You will find that uncommitted Updates released by transaction 2 are read by transaction 1 (readuncommitted ).
Step 4: Transaction 2 discard the commit and roll back the transaction tx2.rollback ();.
Step 5: Call readuncommitteddatabytransaction1 (); again to read the data in the database. This is the data that has been rolled back.
The program running result is as follows:
-- Read age from database:Age:20-- Run an uncommitted command:UPDATE student SET age=30 WHERE id=1-- Read age from database:Age:30-- Transaction 2 rollbacked!-- Read age from database:Age:20
The methods for readuncommitteddatabytransaction1 () and uncommittedupdatebytransaction2 () are defined as follows:
private static void UnCommittedUpdateByTransaction2(){string command = "UPDATE student SET age=30 WHERE id=1";Console.WriteLine("\n-- Run an uncommitted command:\n{0}\n", command);SqlCommand cmd = new SqlCommand(command, conn2);cmd.Transaction = tx2;cmd.ExecuteNonQuery();}private static void ReadUnCommittedDataByTransaction1(){Console.WriteLine("-- Read age from database:");SqlCommand cmd = new SqlCommand("SELECT age FROM student WHERE id = 1", conn1);cmd.Transaction = tx1;try{int age = (int)cmd.ExecuteScalar();Console.WriteLine("Age:{0}", age);}catch(SqlException e){Console.WriteLine(e.Message);}}
From the experiment above, we can see that at the readuncommitted isolation level, the program may read uncommitted data, but this isolation level has the least lock on database resources.
The complete code of this experiment can be found under "samplecode \ Chapter 2 \ lab 2-6.
Let's start another experiment (this experiment requires fast actions, otherwise it may not be able to see the expected results ). First, modify the setup () method code in the above Code
tx1 = conn1.BeginTransaction(IsolationLevel.ReadUncommitted);
Changed:
tx1 = conn1.BeginTransaction(IsolationLevel.ReadCommitted);
When you run the code again, you will find that the program does not run until the third step. If you have enough patience to wait, you will see that "the timeout time has reached. The timeout has expired or the server has not responded before the operation is completed. "A prompt, what does this prompt mean? Let's look at what happened:
Step 1: Open the Enterprise Manager of SQL Server 2000 before doing this experiment, and then open the SQL Server event inspector.
Step 2: run the modified program. When the program is executed halfway, It is paused. Quickly switch to the Enterprise Manager interface, right-click "current activity" under "manage", and select "refresh" (the entire process should be completed in about 15 seconds, as shown in 2-8 ), we get a snapshot of the current process of the database.
Figure 2-8 use the Enterprise Manager to view the current activity
We found that the process was blocked at this time. The blocked process was the No. 52 process, and the blocked process was the No. 53 process. That is to say, the work of process 53 hinders the continued work of process 52. (Process numbers may be different in different experiments)
Step 3: to further identify the cause, we switch to the event Inspector window to see what the two processes are doing. From 2 to 9, the event Inspector displays the details of the two processes. We can see that process 52 corresponds to our transaction process 1 and process 53 corresponds to our transaction 2. Transaction 2 executes the update command but has not yet committed it. At this time, transaction 1 reads uncommitted data and is blocked. We can see that process 52 is blocked.
At this time, if transaction 2 is committed, process 52 can stop waiting to obtain the expected results. However, our program did not submit data, so process 52 would have to wait infinitely. Fortunately, SQL Server 2000 detects that transaction 2 has been running for too long (this is the error message above, "the timeout has reached. The timeout has expired or the server has not responded before the operation is completed. "), So roll back transaction 2 to release the occupied resources. After the resource is released, process 52 can be executed.
Figure 2-9 event Inspector detects blocking commands
Step 4: After learning about what happened above, we can discuss the usage of shared and exclusive locks in depth. Return to the Enterprise Manager page and check the resources occupied by each process. As shown in Figure 2-10, process 53 (transaction 2) adds the exclusive lock (x lock) to the corresponding key before executing the update command ), according to the Level 1 blocking protocol mentioned above, the exclusive lock is released only after transaction 2 is committed or rolled back. Now process 52 (transaction 1) needs to read the same row of data. According to the Level 2 blocking protocol, it must first apply a shared lock to the row. However, the row of data has been added with an exclusive lock by transaction 2, therefore, transaction 1 can only be in the waiting state, waiting for the exclusive lock to be released. Therefore, we can see the previous "blocking" problem.
Figure 2-10 the exclusive lock is applied before the process executes the write operation.
Figure 2-11 The read operation of a process requires a shared lock but is blocked.
When the transaction isolation level of transaction 1 is readuncommitted, the read data is not locked, so the exclusive lock does not work for readuncommitted, and the process is not blocked, but the read is indeed "dirty" data.
2.3.2 repeatableread
Repeatableread refers to repeatableread, which has a higher isolation level than readcommitted. It allows the data to remain unchanged when a transaction executes repeated reads, but it still cannot solve the phantom read problem. To better understand the problems that repeatableread can solve, we still use the following experiment to confirm:
Step 1: Set transaction 1 and transaction 2 to readcommitted at the same time and start the transaction at the same time.
private static void Setup(){conn1 = new SqlConnection(connectionString);conn1.Open();tx1 = conn1.BeginTransaction(IsolationLevel.ReadCommitted);conn2 = new SqlConnection(connectionString);conn2.Open();tx2 = conn2.BeginTransaction(IsolationLevel.ReadCommitted);}
Step 2: Transaction 1 reads data from the database. Note that transaction 1 is not ended by committing or rolling back at this time, and transaction 1 is still active.
private static int ReadAgeByTransaction1(){return (int)ExecuteScalar("SELECT age FROM student WHERE (id = 1)");}private static object ExecuteScalar(string command){Console.WriteLine("-- Execute command: {0}", command);SqlCommand cmd = new SqlCommand(command, conn1);cmd.Transaction = tx1;return cmd.ExecuteScalar();}
Step 3: Modify the age data and submit the modification.
private static void ModifyAgeByTransaction2(){string command = "UPDATE student SET age=30 WHERE id=1";Console.WriteLine("-- Modify age by transaction2, command:{0}", command);SqlCommand cmd = new SqlCommand(command, conn2);cmd.Transaction = tx2;try{cmd.ExecuteNonQuery();tx2.Commit();}catch(Exception e){Console.WriteLine(e.Message);tx2.Rollback();}}
Step 4: Read the age data repeatedly in transaction 1. The read data is modified, which is different from the data read last time! As the name suggests, it cannot be read repeatedly. The main program code is as follows:
public static void Main(){Setup();try{int age1 = ReadAgeByTransaction1();ModifyAgeByTransaction2();int age2 = ReadAgeByTransaction1();Console.WriteLine("\nFirst Read: age={0}\nSecond Read: age={1}", age1, age2);}catch(Exception e){Console.WriteLine("Got an error! " + e.Message);}finally{CleanUp();}}
The program running result is as follows:
-- Execute command: SELECT age FROM student WHERE (id = 1)-- Modify age by transaction2, command:UPDATE student SET age=30 WHERE id=1-- Execute command: SELECT age FROM student WHERE (id = 1)First Read: age=20Second Read: age=30
The reason why the data read during repeated reads is different from that for the first read is that transaction 1 is set to the readcommitted isolation type. This isolation level cannot prevent non-repeated reads. To read identical data twice in a transaction, you must use the repeatableread transaction isolation level.
Let's modify the code in the setup () method above and set the isolation level of transaction 1 to repeatableread:
tx1 = conn1.BeginTransaction(IsolationLevel.RepeatableRead);
Run the program again, and you will find that the program is paused after the second step. If you wait for a while, you will see that "the timeout time has reached. The timeout has expired or the server has not responded before the operation is completed. "Error message, at this time, the repeated read data is exactly the same as the first read. The program execution result is as follows:
-- Execute command: Select age from student where (ID = 1) -- modify age by TRANSACTION2, command: Update student set age = 30 Where id = 1 the timeout time has reached. The timeout has expired or the server has not responded before the operation is completed. -- Execute command: Select age from student where (ID = 1) first read: age = 20 second read: age = 20
To find out the cause, execute the program again as in the previous case. When the pause occurs, quickly switch to the Enterprise Manager to view the snapshot of the current activity and check the data lock in the blocked process, you will find the content shown in Figure 2-12 and Figure 2-13:
Figure 2-12 apply the S lock when reading data in repeatableread until the transaction ends.
Figure 2-13 apply the X lock to modify data, but the lock is blocked.
According to the three-level blocking protocol, transaction T must apply the S lock to the data before reading it until the transaction ends. Therefore, when transaction 1 reads data for the first time, it adds a shared lock to the Data. After the first data reading is complete, the transaction has not ended. Therefore, the shared lock will not be released, at this time, transaction 2 tries to modify the data. According to the Level 2 blocking protocol, the exclusive lock is required before writing, but the shared lock on the data has not been released, so transaction 2 has to wait. When the wait time of transaction 2 times out, SQL Server forcibly rolls back the transaction. Although transaction 2 fails to be executed, transaction 1 ensures transaction isolation at the Repeatable read level.
The repeatableread transaction isolation level allows repeated read operations in the transaction, but this does not avoid phantom read problems. If your program has potential phantom read problems, you must use the highest transaction isolation level: serializable.
2.3.3 serializable
The serializable isolation level is the highest transaction isolation level. At this isolation level, dirty Data Reading, non-repeated read, and phantom read will not occur. Before explaining in detail why, let's first look at what Phantom reads are.
Phantom read refers to the process where transaction 1 reads certain data records from the database based on certain conditions, and transaction 2 inserts some new records that meet the search criteria of transaction 1, when transaction 1 reads data again based on the same conditions, it finds that there are more records. Let's use the following cases to reproduce the phantom read problem:
Step 1: set both transaction 1 and transaction 2 to the repeatableread isolation level and start the transaction at the same time.
private static void Setup(){conn1 = new SqlConnection(connectionString);conn1.Open();tx1 = conn1.BeginTransaction(IsolationLevel.RepeatableRead);conn2 = new SqlConnection(connectionString);conn2.Open();tx2 = conn2.BeginTransaction(IsolationLevel.RepeatableRead);}
Step 2: Transaction 1 reads the average score of students whose student ID is 1 and the number of courses they have learned. At this time, I learned three courses from student 1, with an average score of 73.67. Note that transaction 1 is not committed at this time.
private static double ReadAverageMarksByTransaction1(){return (double)ExecuteScalar("SELECT AVG(mark) AS AvgMark FROM SC WHERE (id = 1)");}private static int ReadTotalCoursesByTransaction1(){return (int)ExecuteScalar("SELECT COUNT(*) AS num FROM SC WHERE (id = 1)");}private static object ExecuteScalar(string command){Console.WriteLine("-- Execute command: {0}", command);SqlCommand cmd = new SqlCommand(command, conn1);cmd.Transaction = tx1;return cmd.ExecuteScalar();}
Step 3: Transaction 2 inserts a new record into the database, so that students whose student ID is 1 can take another course with a score of 80. Then, submit the changes to the database.
private static void InsertRecordByTransaction2(){string command = "INSERT INTO SC VALUES(1, 5, 80)";Console.WriteLine("-- Insert to table SC by transaction 2");Console.WriteLine("-- Command:{0}\n", command);SqlCommand cmd = new SqlCommand(command, conn2);cmd.Transaction = tx2;try{cmd.ExecuteNonQuery();tx2.Commit();}catch(Exception e){Console.WriteLine(e.Message);tx2.Rollback();}}
Step 4: Transaction 1 reads the average score of students whose student ID is 1 and the number of courses they have learned. At this time, I read four courses with an average score of 75.25. Different from the first read! There was an extra course, which appeared in front of us like a phantom. The main program for testing is as follows:
public static void Main(){Setup();try{Console.WriteLine(">>>> Step 1");double avg = ReadAverageMarksByTransaction1();int total = ReadTotalCoursesByTransaction1();Console.WriteLine("avg={0,5:F2}, total={1}\n", avg, total);Console.WriteLine(">>>> Step 2");InsertRecordByTransaction2();Console.WriteLine(">>>> Step 3");avg = ReadAverageMarksByTransaction1();total = ReadTotalCoursesByTransaction1();Console.WriteLine("avg={0,5:F2}, total={1}\n", avg, total);}catch(Exception e){Console.WriteLine("Got an error! " + e.Message);}finally{CleanUp();}}
The program execution result is as follows:
>>>> Step 1-- Execute command: SELECT AVG(mark) AS AvgMark FROM SC WHERE (id = 1)-- Execute command: SELECT COUNT(*) AS num FROM SC WHERE (id = 1)avg=73.67, total=3>>>> Step 2-- Insert to table SC by transaction 2-- Command:INSERT INTO SC VALUES(1, 5, 80)>>>> Step 3-- Execute command: SELECT AVG(mark) AS AvgMark FROM SC WHERE (id = 1)-- Execute command: SELECT COUNT(*) AS num FROM SC WHERE (id = 1)avg=75.25, total=4
You can think about why the repeatableread isolation mode cannot make the average values of the two reads the same? (This phenomenon can be explained from the lock perspective ).
As we have done before, let's look at what happened. Before probing, set the isolation level of transaction 1 in the setup method to serializable and run the program again. When the program is found to be running temporarily, view the snapshot of the current activity of the database, you will find the locking problems shown in Figure 2-14 and Figure 2-15:
Figure 2-14 added the ranges-s lock to data that meets the search criteria in serializable Isolation Mode
Figure 2-15 when attempting to insert a record that meets the rangein condition, the record can only be in the waiting state
We can see that in serializalbe Isolation Mode, when the database retrieves data, it adds the ranges-s shared lock to all records that meet the search conditions. When transaction 2 tries to insert a record that meets the rangein condition, it must wait for the release of these rangs-s locks; otherwise, it can only be in the waiting state. After waiting for timeout, transaction 2 will be forcibly rolled back by SQL Server.
The modified program running result is as follows:
>>>> Step 1 -- execute command: Select AVG (Mark) as avgmark from SC where (ID = 1) -- execute command: Select count (*) as num from SC where (ID = 1) AVG = 73.67, total = 3 >>> Step 2 -- insert to table SC by transaction 2 -- command: the insert into SC values (1, 5, 80) Time-out period has reached. The timeout time has elapsed before the operation is completed or the server has not responded. >>> Step 3 -- execute command: Select AVG (Mark) as avgmark from SC where (ID = 1) -- execute command: Select count (*) as num from SC where (ID = 1) AVG = 73.67, total = 3
The operation of transaction 2 fails to ensure that transaction 1 does not encounter phantom read problems. It should be noted that Level 1, level 2, and level 3 blocking protocols cannot effectively solve phantom read problems.
2.3 suggestions
Through the above examples, we have gained a deeper understanding of the measures taken by the database to solve the concurrency consistency problem. The lock mechanism is the underlying guarantee mechanism, but it is difficult to use it directly. We can indirectly use the locking mechanism through different transaction isolation modes to ensure the integrity and consistency of our data. When using different isolation modes, we should also pay attention to the following issues:
- In general, readcommitted isolation level is enough. A high isolation level will lock too many resources and affect data sharing efficiency.
- The isolation level you choose depends on your system and business logic.
- Avoid using locks directly, unless you have.
- We can implement different update policies by controlling the fields in the where phrase to prevent loss of modifications. However, unnecessary update policies may cause inefficient execution of SQL commands. Therefore, use timestamps and too many protection fields as the basis for updating.