A database is a shared resource that can be used by multiple users. These user programs can be executed one by one in a serial manner. At each time, only one user program runs and accesses the database. Other user programs can access the database only after the user program is completed. However, if a user program involves a large amount of data input/output exchanges, the database system remains idle for most of the time. Therefore, in order to make full use of database resources and give full play to the features of database shared resources, multiple users should be allowed to access the database in parallel. However, in this case, multiple user programs concurrently access the same data. If you do not control concurrent operations, you may access and store incorrect data, undermining Database Consistency, therefore, the database management system must provide a concurrency control mechanism. The quality of the concurrency control mechanism is one of the important indicators to measure the performance of a database management system. DM uses a blocking mechanism to solve concurrency problems. It can ensure that multiple user programs are running at any time, but all user programs are running in completely isolated environments. |
I,Preparations for concurrency control |
(1) concurrency control Overview |
Concurrency control is performed in units of transactions. |
1. The unit of concurrency control-transaction |
A transaction is the logical unit of work of a database. It is a user-defined sequence of operations. A transaction can be a set of SQL statements, an SQL statement, or the entire program. |
The start and end of a transaction can be controlled by the user. If the user does not explicitly define the transaction, the database system automatically divides the transaction according to the default rules. |
Transactions should have four attributes: atomicity, consistency, isolation, and persistence. |
(1) atomicity |
The atomicity of the transaction ensures that a group of update operations in the transaction are atomic, that is, these operations are a whole, and all or all operations are not done for the database, and cannot be partially completed. This nature can be ensured even after the system crash, and the database will be restored after the system crash to restore and revoke the impact of active transactions in the system crash on the database, this ensures the atomicity of transactions. Before the system modifies any actual data on the disk, the information about the modification operation is recorded on the disk. When a crash occurs, the system can record the status of the transaction based on these operations to determine whether to cancel all the modifications made by the firm or re-execute the modifications. |
(2) Consistency |
Consistency requires that the database be changed from one consistent state to another after the transaction is executed. It is a logical attribute based on consistency rules. For example, in the transfer operation, the amount of each account must be balanced. This rule is mandatory for programmers, consistency is closely related to atomicity. The consistency attribute of transactions requires that the consistency of transactions still meet the requirements in the case of concurrent execution of transactions. It is not logically independent, and is expressed by the isolation of transactions. |
(3) Isolation |
Isolation means that the execution of a transaction cannot be disturbed by other transactions. That is to say, the operations and data used within a transaction are isolated from other concurrent transactions, and the transactions executed concurrently cannot interfere with each other. It requires that even if multiple transactions are executed concurrently, it seems that each successful transaction is executed according to serial scheduling. Another method of this nature is serializable. That is to say, any staggered Operation Scheduling allowed by the system is equivalent to a serial scheduling. Serial scheduling means that a transaction is scheduled each time. Before all operations of a transaction are completed, other transaction operations cannot begin. Due to performance, we need to schedule staggered operations, but we also hope that the scheduling effect of these staggered operations is consistent with that of a serial scheduling. DM implements this mechanism by adding an appropriate lock to the Data Access Object of the transaction, so as to exclude other transactions from concurrent operations on the same database object. |
(4) Persistence |
The persistence guarantee provided by the system requires that once a transaction is committed, the modifications made to the database will be persistent, and no matter what machine or system failure occurs, there should be no impact on it. For example, when an ATM pays a sum of money to a customer, there is no need to worry about losing the customer's withdrawal records. The persistence of transactions ensures that the impact of transactions on the database is persistent, even if the system crashes. As mentioned in atomicity, the system provides this guarantee by recording. |
DM does not provide statements that explicitly define the start of a transaction. The first executable SQL statement (except the CONNECT statement) implies the start of a transaction, however, the end of a transaction can be explicitly controlled by the user. In DM, a transaction ends (normal or abnormal) in the following situations: |
(1) When the attribute of a connection is set to automatic submission, each statement is submitted; |
(2) When a COMMIT/ROLLBACK statement is encountered, a transaction is committed/rolled back; |
(3) When the automatic DDL submission switch of the system is enabled (on by default), the DDL statement and the previous DML and DDL operations are automatically submitted in case of a DDL statement; |
(4) The process of the firm ends normally and the user exits; |
(5) Abnormal Termination of the system; |
Note: DM provides the automatic DDL_AUTO_COMMIT switch for DDL statements in the configuration file. When the value of this configuration item is 1 (default), all DDL statements are automatically submitted. When the value of this configuration item is 0, all DDL statements except CREATEDATABASE, ALTERDATABASE, and CREATESCHEMA statements are not automatically submitted. |
Consistency in DM is based on transactions. DM uses commit and rollback to save and permanently modify the database. However, whether committed or rolled back, DM ensures that the database is consistent before and after each transaction starts. To improve the flexibility of transaction management, DM provides SAVEPOINT statements and rollback to save point statements. A storage point provides a flexible rollback. During execution, a transaction can be rolled back to a storage point. operations before the storage point are valid, and subsequent operations are rolled back. |
Transactions in DM also have four attributes: atomicity, consistency, isolation, and persistence. |
2. inconsistency between concurrent operations and data |
If it is not locked and multiple users access a database at the same time, a problem may occur when their transactions use the same data at the same time, resulting in data inconsistency in the database. |
The most common example of concurrent operations is the ticket booking operation in the train/airplane ticket booking system. For example, an activity sequence in the system: |
1. The ticket clerk A reads the balance of the number of tickets on A flight, and sets it to A = 16; |
2. Conductor B reads the balance of the number of tickets on the same flight, which is 16; |
3, A ticket clerk sold A ticket, change the number of tickets A = A-1 = 15, write A back to the database; |
4, B ticket conductor also sold A ticket, change the number of tickets A = A-1 = 15, write A back to the database. |
As a result, the balance of the two tickets is reduced by 1. |
This situation is called Database Inconsistency. This inconsistency is caused by the concurrent operations of ticket sellers A and B. In the case of concurrent operations, the scheduling of transaction operation sequences a and B is random. If the preceding scheduled sequence row is used, the modification of transaction a is lost. This is because transaction B modifies transaction A in step 1 and writes back the modification of transaction. |
The Database Inconsistency caused by concurrent operations can be divided into four categories: loss or overwrite update, dirty read, non-repeated read, and phantom read. The above example is only one of the concurrency problems. |
(1) lost update) |
When two or more transactions select the same data and update the data based on the originally selected value, the update will be lost. Every transaction does not know the existence of other transactions. The last update will overwrite the updates made by other firms, which will lead to data loss. The preceding plane ticket booking example is a concurrency issue. Transaction 1 and transaction 2 successively read the same data A = 16, transaction 1 executes the A-1, and writes back result A = 15, transaction 2 executes the A-1, and writes back result A = 15. The result of transaction 2 commit overwrites the modification of transaction 1 to the database, thus the modification of transaction 1 to the database is lost. |
|
(2) Dirty read |
One Transaction reads data written by another uncommitted parallel transaction. When the second transaction selects another row being updated, unconfirmed correlation issues will occur. The data being read by the second transaction has not been confirmed and may be changed by the transaction that updates this row. In other words, when transaction 1 modifies a certain data and writes it back to the disk, transaction 2 reads the same data, and transaction 1 is canceled for some reason, at this time, when transaction 1 has modified the original data recovery value, the data read by transaction 2 is inconsistent with the data in the database. It is incorrect data, which is called dirty read. |
For example, in middle, transaction 1 changes the C value to 200, transaction 2 reads C to 200, and transaction 1 cancels for some reason, the modification is voided, and C restores the original value to 100, at this time, what transaction 2 reads is the incorrect "dirty" data. |
|
(3) nonrepeatable read) |
A transaction reads the previously read data and finds that the data has been modified by another committed transaction. That is, after transaction 1 reads a certain data, transaction 2 modifies it. When transaction 1 reads the data again, it gets a different value than the first time. |
For example, in transaction 1 reads B = 100 for calculation, transaction 2 reads the same data B, modifies it, and writes B = 200 back to the database. Transaction 1 re-reads B for the read value, and B is already 200, which is inconsistent with the first read value. |
|
(4) phantom reading |
This happens if another transaction can change the query result before submitting the query result. This statement can also be used to explain that transaction 1 does not submit the query results after reading some data records from the database according to certain conditions, and transaction 2 deletes some of the records, when transaction 1 reads data based on the same conditions again, it finds that some records have mysteriously disappeared; or transaction 1 does not submit the query results after reading some data records from the database based on certain conditions, transaction 2 inserts some records. When transaction 1 reads data again based on the same conditions, it finds that there are more records. |
The main reason for the above four types of data inconsistency is that concurrent operations undermine the isolation of transactions. Concurrency Control is to use the correct method to schedule concurrent operations so that the execution of a user's transaction is not affected by other transactions, so as to avoid data inconsistency. |
3,Concurrent scenario listing |
Use SQL statements to list various concurrency situations (including situations that may result in data inconsistency and do not affect data consistency ). A Indicates a piece of data. B and c both indicate two or more pieces of data that meet a certain standard. ^ indicates the meaning of "Non, 1 indicates the first transaction, 2 indicates the second transaction. |
|