Summary:
A transaction is the smallest logical execution unit and the basic unit of concurrency control of a database, and the result of its execution must change the database from one consistency state to another. A transaction has four important characteristics, atomicity (atomicity), consistency (consistency), isolation (isolation), and persistence (durability). This paper first describes the nature of the transaction in the database and its four characteristics (ACID), then focuses on the motivation and connotation of the transaction isolation, and introduces the transaction isolation level provided by the database and the transaction concurrency problems that can be resolved by these transaction isolation levels. In the balance between concurrency security and concurrency efficiency, we generally do not blindly increase the transaction isolation level to ensure transactional concurrency security, but by combining other mechanisms, including the optimistic lock and pessimistic locking mechanism mentioned by the author, to solve the database transaction concurrency problem.
Copyright Notice and Acknowledgements:
This article original nerd Rico
Author Blog address: http://blog.csdn.net/justloveyou_/
Examples of the explanations for dirty reads, non-repeatable reads, and Phantom reads are from the blog post, "Database transaction isolation level."
I. Overview of transactions
In general, each request of a user corresponds to a business logic method, and each business logic method tends to have a logical atomicity. In addition, a business logic method often includes a series of database atomic access operations, and these database atomic access operations should be bound to a whole, that is, either all executed, or all do not execute, in this way we can guarantee the integrity of the database. In other words, a transaction is the smallest logical execution unit and the basic unit of database maintenance data consistency.
In general, a transaction is an indivisible sequence of operations and the basic unit of concurrency control of a database, and the results of its execution must change the database from one consistent state to another. a transaction has four important characteristics, atomicity (atomicity), consistency (consistency), isolation (isolation), and persistence (durability).
(1). atomicity (atomicity)
Atomicity means that all the operations that a transaction contains are either all successful or all fail to roll back. Therefore, the operation of a transaction must be fully applied to the database if it succeeds, and if the operation fails, it will have no effect on the database, that is, the transaction is the smallest logical executor that is not allowed to be divided in the application.
(2). Conformance (consistency)
Consistency means that the result of a transaction execution must change the database from one consistency state to another, that is, the database must be in a consistent state before and after a transaction executes. Take the transfer, assume that user A and User B both the sum of money together is 5000, then no matter how a and B transfer, transfer several times, after the end of the transaction two users of the sum should be 5000, this is the consistency of the transaction.
(3). Isolation (Isolation) -Directly related to transactional concurrency
isolation means that transactions that are executed concurrently cannot affect each other. that is, for any two concurrent transactions T1 and T2, in the case of the transaction T1, the T2 either ends before the T1 starts, or starts after the T1 ends, so that no other transaction is sensed to be executing concurrently. The isolation of transactions is discussed below.
(4). Persistence (Durability)
Persistence refers to the fact that once a transaction is committed, changes to the data in the database are permanent, even if the database system encounters a failure, and the commit transaction is not lost. In other words, once a transaction is committed, any changes made to the database are recorded in permanent memory (usually saved to the physical database).
Two. The connotation of transaction isolation
The above introduction of the basic concept of the transaction and its four characteristics (called acid), now focus on the separation of the transaction. We know that when multiple threads are opening data in a transactional operations database, the database system is able to isolate operations to ensure the accuracy of each thread's access to the data. In other words, the isolation of the transaction is mainly used to solve the concurrency security problem of the transaction, so what are the specific problems that the transaction's isolation solves?
1. Problems caused by transaction concurrency
(1). Dirty Read
Dirty reads refer to data that is not committed by another transaction during one transaction. For example, when a transaction is modifying a data several times, and when the transaction changes to the data are not committed, then a concurrent transaction to access the data will result in dirty reading of the data. Look at the following example:
The company paid, the leader of the 5000 yuan to the Singo account, but the transaction did not submit, and Singo just to check the account, found that the salary has been to the account, is 5000 yuan whole, very happy. Unfortunately, the leadership found that the amount of wages issued to Singo is not correct, is 2000 yuan, and then quickly rolled back to business, modify the amount, the transaction will be submitted, and finally singo the actual salary of only 2000 yuan, Singo empty joy a game.
The above-mentioned situation is what we call dirty reading, that is, for two concurrent transactions (transaction A: lead to Singo payroll, Transaction B:singo Query Payroll account), transaction B reads the data that transaction A has not yet committed. In particular, dirty reads can be avoided when the isolation level is set to read Committed, but may still cause non-repeatable reads. In particular, the default level for most databases is read committed, such as SQL Server, Oracle.
(2). Non-repeatable reading
Non-repeatable reading means that for a data in a database, multiple queries within a transaction scope return different data values because the data is modified and committed by another transaction at the query interval. For example, the transaction T1 reads a certain data, and the transaction T2 immediately modifies the data and commits the transaction, and when the transaction T1 reads the data again, it results in a different result, that is, non-repeatable reads occur. the difference between non-repeatable reads and dirty reads is that dirty reads are dirty data that a transaction reads from another transaction, and non-repeatable reads the data submitted by the previous transaction. look at the following example:
Singo take the payroll card to spend, the system read to Cary really have 2000 yuan, and at this time her wife also just in the online transfer, the Singo Pay card of 2000 yuan to another account, and before Singo submitted the business, when Singo deduction, System Check to Singo's payroll card has no money, deduction failure, Singo very puzzled, obviously card money, why ...
This is what we call non-repeatable reading, that is, two concurrent transactions (transaction A:singo consumption, transaction B:singo wife online transfer), transaction A in advance read the data, transaction B immediately after updating the data and commit the transaction, and transaction a read the data again, the data has changed. You can avoid non-repeatable reads when the isolation level is set to repeatable read. At this time, when Singo take the payroll card to spend, once the system began to read the Payroll card information (that is, the start of the transaction), Singo's wife can not be modified, that is Singo wife can not transfer at this time. In particular, MySQL's default isolation level is repeatable read.
(3). Phantom Reading
Phantom reading is a phenomenon that occurs when a transaction is not executed independently, that is, in the process of one transaction reading, another transaction may insert a new data record, affecting the result of the transaction read. For example, a transaction T1 a data item from "1" to "2" for all rows in a table, and the transaction T2 inserts a row of data items into the table, and the value of the data item is "1" and is submitted to the database. At this point, the user of the operation transaction T1, if you look at the data you just modified, you will find that there is another line is not modified, in fact, this line is added from the transaction T2, as if to produce hallucinations, this is a phantom read. both Phantom and non-repeatable reads read another transaction that has already been committed (this is different from dirty reading), and the difference is that non-repeatable read queries are the same data item, and Phantom reads are for data record insertion/deletion issues, and the points of concern are not the same. look at the following example:
Singo's wife works in the banking department, and she often views Singo's credit card consumption records through the internal banking system. One day, she is inquiring to Singo month credit card total consumption amount is 80 yuan, and Singo at this time is good outside Hu eats sea plug to pay at the cashier, spend 1000 yuan, namely added a 1000 yuan of consumption record and submitted the transaction, Then Singo's wife will singo the current month credit card consumption details printed to A4 paper, but found that the total consumption of 1080 yuan, Singo wife is very surprised, thought there was an illusion, the illusion of this produced. When the isolation level is set to serializable (the highest transaction isolation level), you can avoid both dirty reads, non-repeatable reads, and Phantom reads. But at the same time the cost is the highest, the performance is very low, generally rarely used, because at this level concurrent transactions will be executed serially.
2. Summary
In general, the isolation of transactions is primarily used to resolve transactional concurrency security issues. The above mentioned three typical problems of dirty reading, non-repeatable reading and phantom reading occur under the premise of transaction concurrency, and the difference is that the problem of the three concerns is slightly different. Dirty reading is concerned that the transaction reads uncommitted data from another transaction, and non-repeatable reads are concerned with the fact that the results of multiple reads of the same item in the same transaction are different; Phantom reads focus more on Insert/delete issues for data records, such as the results of multiple queries for data records that match the same criteria in the same transaction. Further, the non-repeatable reading is concerned about the problems caused by the updating of data, which is concerned with the problem of adding and deleting data.
Three. Transaction isolation level of the database
The transaction isolation levels for different databases vary. For example, as we mentioned in the previous section, the MySQL database supports the following four isolation levels, and defaults to the repeatable read level, whereas in the Oracle database, only the serializable level and read committed levels are supported, and the default is Read committed level. The MySQL database provides us with four isolation levels, namely:
- Serializable (serialization): The highest level, can avoid dirty reading, non-repeatable reading, the occurrence of phantom reading;
- REPEATABLE READ (Repeatable Read): Can avoid dirty read, non-repeatable reading occurs;
- Read Committed (reading committed): can avoid the occurrence of dirty reading;
READ UNCOMMITTED (unread): lowest level, no case guaranteed.
As you can see, the highest of the above four isolation levels is the serializable level, and the lowest is the Read uncommitted level. Of course, the higher the isolation level, the more secure the transaction concurrency is, but the less efficient it is to perform. For example, the level of Serializable is to ensure serial execution of concurrent transactions in the form of a lock table (similar to a lock in Java Multi-threading), but at this point the execution efficiency is minimized, so the choice of isolation level is essentially a balance of concurrency security and concurrency efficiency. Should depend on the actual situation. In particular, in the MySQL database, the default transaction isolation level is REPEATABLE READ (repeatable read), let's look at how to manipulate the isolation level of transactions in the MySQL database.
1). mysql default transaction isolation level view
In the MySQL database, we can view the isolation level of the current transaction in the following ways:
select @@tx_isolation;
2). mysql Transaction ISOLATION level modification
In the MySQL database, we can set the isolation level of the transaction in the following two ways, respectively:
set [glogal | session] transaction isolation level 隔离级别名称;或 set tx_isolation=‘隔离级别名称‘;
3). Use JDBC to set the isolation level of database transactions
Setting the isolation level of the database must be before the transaction is opened. In particular, when using JDBC to set the isolation level for a transaction on a database, we should call the settransactionisolation of the connection object before calling the Setautocommit (false) method of the Connection object ( Level) To set the isolation levels for the current link are as follows:
As for the parameter level, you can use the field of the connection interface, as shown in the following code:
In particular, setting the transaction isolation level in this way is only valid for the current link. For a MySQL command window, a window is equivalent to a link, and the isolation level set by the current window is only valid for transactions in the current window; for the JDBC operations database, a Connection object is equivalent to a link, The isolation level set for the connection object is only valid for the connection object, regardless of the other linked connection objects.
Four. Database concurrency control
Perhaps you have heard that there are two types of locks, one called pessimistic lock , one called optimistic lock . In fact, whether pessimistic lock or optimistic lock, is the concept of people defined, is a problem-solving thinking. therefore, not only in the database system has optimistic lock and pessimistic lock concept, like Memcache, Hibernate, Tair, etc. have similar concepts. For example, in-line concurrent processing, synchronized built-in lock is a pessimistic lock, also known as an exclusive lock , add the Synchronized keyword code is basically only a single-threaded form to execute, It causes other threads that need the resource to hang until the previous thread finishes executing the freed resource , and the optimistic lock is a more efficient mechanism, the principle of which is to do an operation without locking each time, if there is a conflict and then try again, until successful, in fact, is not a lock, so many places also called spin .
When dealing with the transaction concurrency access problem of the database, although the transaction serialization can ensure that the data is not inconsistent with the data under multi-transaction concurrency, serial execution degrades the processing performance of the database, which is often unacceptable to us. So, in general, we often combine transaction isolation levels with other concurrency mechanisms to ensure transactional concurrency, which balances the efficiency and security of transactional concurrency. In fact, the isolation level of most databases is set to read Committed (only read data submitted by other transactions), and the application uses the optimistic lock/pessimistic locking mechanism to solve other transactional concurrency problems, such as non-repeatable reads. In particular, optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main technical means of concurrency control.
In particular, the idea of optimistic locking is that, assuming no concurrency conflicts occur, only check for breaches of data integrity while committing the operation, and the idea of pessimistic locking is to assume that concurrency conflicts occur and that all operations that might violate data integrity are masked. needles for different business scenarios, you should choose a different concurrency control mode. Therefore, the optimistic concurrency control and pessimistic concurrency control should not be understood narrowly as the concept in DBMS, not to confuse them with the locking mechanism (row, table, exclusive, shared) provided in the data. It should be noted that in the DBMS, pessimistic locking is implemented using the lock mechanism provided by the database itself.
Ps: More about the synchronized keyword , please visit my blog "Java Concurrency: Built-in lock synchronized".
1. Optimistic lock
optimistic Lock, although the name has a "lock", but the optimistic lock does not lock anything, but when the transaction is committed to check whether the record is modified by other transactions: if not, commit; otherwise, roll back. In contrast to pessimistic locks, optimistic locks do not use the lock mechanism provided by the database when processing the database. If the likelihood of concurrency is not significant, then the performance cost of optimistic locking policies is very small. optimistic locking is generally implemented by recording the data version.
A data version is a version identifier that is added to the data. When the data is read, the version identification value is read together, and the version ID is updated every time the data is updated. When we submit an update, the current version of the corresponding record of the database table is judged to be compared with the version ID that was first taken out, if the current version number of the database table is equal to the version identity value of the first fetch, it is updated, otherwise it is considered to be outdated data. In general, there are two ways to implement a data version, one is to use a version number, and the other is to use a timestamp.
2, Pessimistic lock
Pessimistic locking, as its name implies, is a conservative (pessimistic) attitude to the data being modified by the outside world, so the data is locked during the entire data processing process. pessimistic lock implementation often relies on the lock mechanism provided by the database, and only the lock mechanism provided by the database layer can truly guarantee the exclusivity of data access, otherwise, even in this system, the locking mechanism can not guarantee that the external system will not modify the data. pessimistic concurrency control is primarily used in environments where data contention is intense, and when concurrency conflicts occur, the cost of using locks to protect data is lower than the cost of rolling back the transaction. A pessimistic lock is a real lock when compared to an optimistic lock, which locks the batch of data out of a select by using the SQL statement "Select for Update", and waits for other transactions to update the batch of data.
pessimistic concurrency control is actually a conservative strategy of "first fetch lock and then access", which guarantees the security of data processing. However, in terms of efficiency, the mechanism for handling locks can incur additional overhead for the database, as well as an increase in the chance of deadlocks, and in the case of read-only transactions where there is no conflict, there is no need to use locks, which can only increase the load on the system and reduce parallelism, if a transaction locks a row of data. Other transactions must wait for the transaction to finish before processing that row of data.
3. Summary
pessimistic lock and optimistic lock are all a kind of idea to solve concurrency control problem. in particular, in the context of database concurrency control, there are several differences between pessimistic and optimistic locks:
thought: in a transactional concurrency environment, optimistic locking assumes that there is no concurrency conflict, so only checking for violation of data integrity when committing an operation, and pessimistic locking assumes concurrency conflicts that mask everything that might violate data integrity.
implementation: pessimistic lock is implemented by using the lock mechanism provided by the database itself, while optimistic lock is realized by recording data version;
scenario: Pessimistic locks are primarily used for data contention in a fierce environment, or when concurrency conflicts occur when the cost of using locks to protect data is lower than the cost of rolling back the transaction, while optimistic locking is primarily used in environments where concurrency is less likely and data is not competitive At this time the optimistic lock brings the performance consumption is very small;
Dirty read: optimistic lock does not solve dirty read problem, but pessimistic lock can.
In general, pessimistic locks are more secure than optimistic locks, but they are also more expensive, and may even be a case of database deadlock, and are recommended only when optimistic locks are not working.
Five. More
For more information about synchronized keywords , please visit my blog, "Java Concurrency: Built-in lock synchronized".
For more on Java concurrency Programming, follow my column, Java Concurrency Programming learning notes. This column is a comprehensive record of Java concurrency programming knowledge, combined with the operating system, Java memory model and the relevant source code for concurrent programming principles, technology, design, the underlying implementation of in-depth analysis and summary, and continue to follow the concurrency-related technology.
references
Four characteristics of database transactions and the isolation level of transactions
Database Transaction ISOLATION LEVEL
Deep understanding of optimistic lock and pessimistic lock
Summarize the concurrency mechanism of database transaction