I. BACKGROUND
Friends who are familiar with MySQL database know that there are three common patterns of querying data:
1. Select ... : Snapshot read, no lock
2. Select ... in share mode: current read, read lock
3. Select ... for update: current read, add write lock
It's not difficult to understand the three-way scenario from a technical level, so let's review the technical differences in the three reading modes.
Note: To simplify the description of the problem, all of the following conclusions are for the MySQL database InnoDB storage Engine RR isolation level.
1.1 Select ...
reads the snapshot version of the result set at the beginning of the current transaction , and the snapshot version can also be understood as a historical version.
Since the historical version itself is an immutable version, because only a historical version is read and history is not modified, the read mode is relatively simple for resource processing before and after reading:
1. Before the read behavior occurs, if there are other uncommitted transactions that have modified the result set, this read mode does not wait for these transactions to end and the modifications are not read by nature.
2. After the read behavior occurs, this read mode does not prevent other transactions from modifying the data until the current transaction commits, resulting in an updated version of the result set.
1.2 Select ... in share mode
reads the latest version of the result set while preventing other transactions from generating the updated version of the data.
Since the latest version of the data is constantly changing, this read mode needs to force the latest version to be blocked, ensuring that it reads the latest version that everyone agrees to.
This read mode handles the following resources before and after reading:
1. Obtain a read lock before the read behavior occurs. This means that if there are other uncommitted transactions that have modified the result set, this read mode waits for those transactions to end to ensure that they can later read the changes to the result set, while waiting blocks modifications to the result set by other transactions.
2. After the read behavior occurs, this read mode continues to block changes to the result set by other transactions until the current transaction commits.
3. After the current transaction commits, the read lock is freed. This means that all previously blocked transactions can resume and continue executing.
1.3 Select ... for update
This read mode has all the features of select ... in share mode, and It also has the ability to block other transactions from reading the latest version .
This read mode handles the following resources before and after reading:
1. Obtain a write lock before the read behavior occurs. This means that if there are other uncommitted transactions that have modified the result set, this read mode waits for those transactions to end, to ensure that they can later read the changes to the result set, while waiting for other transactions to be organized to read and modify the latest version of the result set.
2. After the read behavior occurs, this read mode continues to block changes to the result set by other transactions, and also blocks the read of the latest version of the result set by other transactions before the current transaction commits.
3. After the current transaction commits, release the write lock. This means that all previously blocked transactions can resume and continue executing.
Three kinds of reading mode in the technical aspects of the difference to this review is over, but we in the actual business programming process, read the database record when to read the lock, when to add a write lock?
What kind of business logic is required to read the snapshot version of historical data and read the latest version of the data map to the business level? Every write a database query code, should be from the technical level to think carefully about the different reading mode of its read behavior before, and then the processing of resources to meet business needs? It's too hard to program.
With the above questions, this article will try to start from the technical functions of each reading mode, the different modes of the technical functions of the differences into business requirements, so as to summarize the different functions of the application scenario, the final output of a few operational strong scene decision rules, Used to quickly answer the question whether the query database should be read-lock or write-locked under different business scenarios.
Second, the difference between the technical function and the business needs of the conversion 2.1 Select ... for update vs Select ... in share mode
The Select ... for update is more exclusive to the latest version of the result set read than the Select ... in share mode. Select ... in share mode just blocks other transactions from generating an updated version of the result set, while select: The for update also blocks other transactions from reading the latest version of the result set.
At what point does the business level need to block other transactions from reading the latest version of the result set?
You don't want others to be able to read the latest version, often because you want to make changes on the latest version, and worry about others as well as yourself. Because when you modify data, you always want your changes to be merged with the latest version of the data (not the historical version), so you will try to get the latest version of the data before you modify it, and modify it based on the latest version. If everyone can get the latest version of the data at the same time and add their own changes in the latest version, and finally we submit data together, there will inevitably be a person's changes to cover the situation of other people modified, this is the classic "update Lost" problem.
In fact, this problem can also be asked in turn, under what circumstances do not have to block other transactions on the result set read it?
If you don't block the reading, the result set read by other transactions is the same, why should you block it? If you do not modify the read result set, then what is the difference between others morning reading late reading?
Through the above thinking, we can draw the following conclusions:
Conclusion One: If you read the data yourself without modifying it, it is definitely not necessary to use the Select ... for update.
Conclusion Two: If you read the data yourself need to modify it, "update lost" problem in most business scenarios should be avoided, so this time need to use SELECT ... for update.
2.2 Select ... in share mode vs Select ...
Select ... in share mode vs. Select ... , the main new two-point constraint has been added:
1. Wait for the transaction commit to modify the data before reading the data.
2. After reading the data, prevent other transactions from modifying the data.
We'll start by using the business-level language to summarize the above two-point constraint as follows: You want to read the latest version of data that everyone agrees to (that is, no one else is modifying the data) and lock it.
So what kind of business scenario do we need to achieve this effect?
I can think of the following two typical scenarios:
Example 1. Processing data based on update timestamp increment
This time I read and process the data before point A, I will not read and process the data in this range again, this is the requirement of incremental processing. If I read before someone has modified the data in this range, only the transaction has not yet been committed (because the modification behavior occurs before point A, so that the update timestamp of the data is also before point a), I read these changes after the commit.
If I use a normal select ... This means that although I have read and processed the data before point A, there are new data in this range after I read it. This will leave out some of the data that has not yet been processed.
If I am using SELECT ... in share mode, you will be able to avoid the problem of leakage processing by waiting for changes within the time range of the query to be submitted before processing the data in this range.
The problem in this example implies a precondition, that is, when the new data is submitted, the party that added the data does not actively notify me to do the processing, but I am going to scan the new data based on the timestamp. The equivalent of the integrity of the business logic is guaranteed by me unilaterally, while the other party is unwilling to serve. In fact, in a scenario where data is processed based on the update timestamp increment, typically the handler is a third party, and the timestamp-based scan of incremental data is simply to ensure that the application system on the original data table does not need to be modified, which reduces intrusion.
(Note: Setting secure read latency is a more common solution when new data is processed based on the update timestamp.) That is, the time per read is set to the current time, x minutes, and X minutes is greater than the maximum time the thing lasts in the system, to ensure that all modifications before the extraction point have been committed. But this approach reduces the real-time nature of data processing. )
So, suppose that every party that modifies data is willing to work together to make every effort to ensure consistency of data and the integrity of business logic, is there no problem? Take a look at the example below.
Example 2. Update an association relationship
For example, for example, there are books and students two tables, a bookstostudents of many-to-many association tables. The new book needs to have this book for every studuent. The new student requires all book to belong to the student. At any time, the requirement for data consistency is that all student have all the book.
If two people A and B, at the same time to open the transaction, a person added Booka, one new STUDUENTB, we each strictly according to data consistency requirements to maintain Bookstostudents Association table.
If you do not use SELECT ... in share mode instead of using SELECT ..., because each transaction cannot read to the other's uncommitted new entities, a does not know that there are studentb, so A's booka will not belong to Studentb;b do not know Booka , so there will be no Booka under B's STUDENTB. After the final two transactions are committed, the result is that STUDENTB does not own Booka.
Both A and B have the opportunity to establish an association record of having Booka under STUDENTB, but this association record is established only when a is added Booka, and B is processed when the STUDENTB is added, and if the required records are not read at both moments, the establishment of this association record will never be triggered again. For many-to-many associations such as "I Have you, you have me" circular dependency of the update, the behavior of both sides if unconstrained random occurrence, it can happen
However, if you use Select ... in share mode, when a reads the students table, it finds that there is no studentb, and B can no longer add STUDENTB to the students table until a transaction commits. At that time, B again read the books table, can also find a submitted Booka, and then the correct new STUDENTB have Booka this association record.
Although the example of many-to-many association relationships, there may be similar problems in a one-to-many association relationship.
Example 1 shows a scenario that can be summed up as:
Conclusion 3: When data consistency and business logic integrity can only be guaranteed unilaterally by yourself, and you use a certain monotonic increment of data to process data, you need to use SELECT ... in share mode to query for updated data.
Example 2 shows a scenario that can be summed up as:
Conclusion 4: When the two entities associated with the relationship may be added at the same time, one party modifies the association by adding a new entity, using the Select ... in share mode to query the other party data for an update of the association relationship.
2.3 Select ... Is it so dangerous to read a snapshot?
Read the above introduction, you may wish to all queries using the most stringent select ... for update, so at least not wrong. But is it really that dangerous as the most common SELECT statement?
Snapshot reading means reading historical data, in fact, the time to look at the long-term, basically most of the data follow up with the possibility of updating. So even with the most stringent select ... for update read mode, the data that is read will eventually fall through the passage of time and become historical data. The user is not more concerned about the number of new data, but some of the data is not too outdated, the snapshot read the historical data is usually the last dozens of milliseconds to a few seconds before the historical version, fully able to meet the user's viewing needs.
When reading the data is for the background strict logic control decision, we will worry about the reading process of the updated version of the data will miss the processing logic in this transaction, but this worry is generally redundant, because other people produce new version of the data, must also trigger a series of processing to ensure the consistency of data and the integrity of business logic, do not have to worry about others in their own affairs.
Iii. Summary
Our principle is usually to prioritize the use of a query pattern with a small lock range to maximize the concurrency of the database. That is, select ..., no, no, no, No. In share mode, no further promotion to select ... for update. And conclusion 1 tells us when we don't need to use SELECT ... for update, in this principle, we need to figure out when we need to use SELECT ... for update, so this conclusion can be ignored.
In our daily development, in most cases, we do not need to unilaterally guarantee the consistency of data and the integrity of business logic, all the data can be modified to cooperate. So conclusion 3 can be omitted temporarily.
So, in the daily development process, we need to remember:
1. Priority to use SELECT ...
2. When the two entities associated with the relationship may be added at the same time, a party modifying the association with the new entity requires the use of SELECT ... in share mode to query the data of the other party for an update of the association relationship.
3. If the data being read needs to be modified and then submitted, the data needs to be read using the Select ... for update.
If you unfortunately need to integrate with a third-party system (or a legacy system that is difficult to modify) in a database, you need to remember one more point:
4. When data consistency and business logic integrity can only be guaranteed unilaterally by you, and you use some monotonic increment of data to process data, you need to use SELECT ... in share mode to query for updated data.
If there are other missing scene rules, you are welcome to add.
A summary of application scenarios for read and write locks on the database in the business