Concurrent Programming (iv): Also talk about the lock mechanism of database

Last Update:2017-03-12 Source: Internet

Author: User

Tags sessions sqlite

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.2cto.com/database/201403/286730.html

1. Problems with database concurrency

The concurrency issues that the database presents include:

1. Missing updates.

2. Unconfirmed correlation (dirty read).

3. Inconsistent analysis (non-repeatable read).

4. Phantom Reading.

Detailed descriptions are as follows:

1.1. Missing updates

A missing update issue occurs when two or more transactions select the same row and then update the row based on the value originally selected. Every transaction is unaware of the existence of other transactions. The last update overrides updates made by other transactions, which results in data loss.

e.g. transaction A and transaction B modify the value of a row at the same time.

Transaction a changes the value to 1 and commits transaction B to change the value to 2 and commit.

At this point the value of the data is 2 and the updates made by transaction A will be lost.

Look at the following sql:

123	select `old_attributes` `From` `table` `where` `primary_key =?` `---step1` `attributes = merge (old_attributes,new_attributes)` `----Step2` `update` `table` `set` `attributes_column = Attributes` `where` `primary_key =?` `----step3`

How to solve it? The basic two kinds of thinking, one is pessimistic lock, the other is optimistic lock, the other is a kind of assumption that the problem is high probability, it is better to lock at the beginning, lest the update always fail; Lest the lock time too long affects other people to do the operation.

1.1.1 Pessimistic lock

A) Traditional pessimistic locking method (not recommended):

In the above example, when the page is initialized to change the payroll (in which case it is usually queried from the database), in this initialization query use SELECT ... for update nowait, by adding the for Update nowait statement, the record is locked, Prevent other users from updating so that subsequent updates are updated in the correct state. Then in the state of keeping this link, do the update submission. Of course, this has a premise is to keep the link, that is, to take a long time to link, this is now the web system high concurrency high frequency under the obvious is not realistic.

b) Now pessimistic locking method (recommended first Use):

In the revised salary this page to make a query, of course, this query must also be locked (SELECT ... for update nowait), some people will say, here to make a query to confirm whether the record has changed, yes, is to make a confirmation, but you do not add for Update will not guarantee that you in the query to update the time of this record has not been updated by other sessions, so this method also need to lock the record in the query, to ensure that the record does not change on the basis of the update, if there is a change in the prompt to inform the user.

1.1.2. Optimistic lock

A) old value condition (pre-mirror) method:

is to use the old state value when SQL update is used to make the condition, SQL roughly as follows Update table set col1 = newcol1value, col2 = Newcol2value .... where col1 = Oldcol1value and col2 = Oldcol2value .... , in the above example we can be the current salary as a condition to update, if this record has been updated by other sessions, then this update 0 lines, here our application system will usually do a hint to inform the user to re-query the update. Which old values to use as conditional updates depends on the actual system. (This can be blocked, and if the application uses pessimistic locking for a long time to lock the record, this session will need to wait, so it's best to use the optimistic locking method when using this approach.) ）

b) Using the version listing method (recommended priority):

In fact, this approach is a specialized pre-image method, that is, you do not need to use multiple old values to make a condition, only need to add a version column on the table, this column can be number or Date/timestamp column, The purpose of this column is to record the version of this data (generally we will add some number and date redundant fields for each table in the table design, so that these redundant fields can be fully used as version columns), and in the application we maintain the version column every time we operate. At the time of the update we updated the last version as a condition. When updating a row, restrict the condition = PRIMARY key + version number, while updating the version number of the record.

The pseudo code is as follows:

12345 start transaction select attributes, old_version from table where primary_key =? attribute Merge operations update table set version = Old_verison + 1, attributes_column = attributes_value where primary_key =? and version = old_version commit

After the transaction commits, see if the record update number for the last update operation is 1, and if not, prompt for retries on the business. (indicates that the update operation is at a higher level of concurrency.) ）

The pessimistic lock B method is selected in the application system where the number of user concurrency is less and the conflict is more serious, and the other is optimistic about locking the version column method.

Specify locks in SQL Server:

12	`SELECT` `FROM` `table` `WITH` `(HOLDLOCK)` `----其他事务可以读取表，但不能更新删除SELECT` `` `FROM` `table` `WITH(TABLOCKX)` `-----其他事务不能读取表,更新和删除`

Different types of database locks differ, and you need to query the respective API doc.

1.2. Unconfirmed correlation (Dirty read dirtyread)

Dirty reads occur when one transaction reads a modification that has not yet been committed by another transaction. e.g.

The original salary of 1.Mary was 1000, and the finance staff changed Mary's salary to 8000 (but did not commit the transaction) 2.Mary read her own salary, found her salary changed to 8000, the joy!

3. While the financial discovery was wrong, and the transaction was rolled back, Mary's salary changed to 1000.

Like this, Mary's 8000 salary is a dirty data.

Workaround: If any other transaction cannot read its modified value before the first transaction commits, you can avoid the problem.

1.3. Inconsistent analysis (non-repeatable reading non-repeatable Read)

The same query occurs more than once in the same transaction, and non-repeating reads occur each time a different result set is returned because of modifications or deletions made by other committed transactions. e.g.

In transaction 1, Mary read her own salary of 1000, and the operation was not completed in transaction 2, when the financial officer modified Mary's salary to 2000 and committed the transaction. In transaction 1, when Mary reads her salary again, her salary becomes 2000

WORKAROUND: You can avoid this problem if you can only read the data after the transaction has been fully committed.

1.4. Phantom ReadingPhantom read

The same query occurs more than once in the same transaction, and Phantom reads occur each time a different result set is returned because of an insert operation made by another commit . A phantom read problem occurs when an INSERT or delete operation is performed on a row that belongs to the range of rows that a transaction is reading. The row range for the first read of a transaction shows that one row has ceased to exist in the second or subsequent read because the row has been deleted by another transaction. Similarly, because of the insert operation of another transaction, the second or subsequent read of the transaction shows a row that does not exist in the original read.

e.g. currently employs 10 people with a salary of 1000.

Transaction 1, read all employees with a salary of 1000. At this point, transaction 2 inserts an employee record into the employee table with a salary of 1000. Transaction 1 reads all employees with a total wage of 1000 to read 11 records.
Workaround: If no other transaction can add new data before the operation transaction finishes processing, the problem can be avoided

Discuss the locking mechanism, and don't know the isolation mechanism of the database.

2. Database isolation mechanism

When it comes to database isolation mechanisms, you have to say business transaction first. Database transactions are strictly defined, and must satisfy 4 characteristics: atomicity (Atomic), consistency (consistency), isolation (isolation), and persistence (Durabiliy), which are referred to as acid.

atomicity: guarantees that all operations in a transaction are performed or not performed entirely. For example, a transfer transaction is performed, either a successful transfer or a failure. Succeeds, the amount is transferred from the transfer account to the destination account, and the amount of the two account changes accordingly, and the amount of the two accounts is unchanged. The transfer out account does not appear to be deducted from the money, and the destination account does not receive the money in the case.
Consistency: Ensure that the database always maintains the consistency of the data-before the transaction operation is consistent, and the transaction is consistent after the operation, regardless of whether the transaction is successful or not. As in the above example, the database is consistent on the data before and after the transfer.
Isolation: When multiple transactions are executed concurrently, the result should be the same as the serial execution of multiple transactions. In concurrent data operations, different transactions have their own data space, and their operations do not interfere with each other. Isolation allows transactions to behave independently or isolated from other concurrently running transactions. By controlling isolation, each transaction is like a unique transaction that modifies the database during its action time. The degree to which a transaction is isolated from other transactions is called the isolation level. The database specifies a variety of transaction isolation levels, which correspond to different levels of interference, the higher the isolation level, the better the consistency of data, but the weaker the concurrency.
Persistence: persistence means that after a thing operation is complete, the impact on the database is persistent, and the database should be recoverable even if the database is compromised by a failure. The usual implementation is to use the log.
The Ansi/iso SQL92 standard defines the isolation level for some database operations. Each isolation level specifies the types of interactions that are not allowed by the current transaction execution, that is, whether the transactions are isolated from each other, or whether they can read or update information that is used by another firm. The higher isolation level includes the restrictions imposed by the lower levels.

4 Isolation Levels defined:

Read uncommited

Uncommitted records can be read. This isolation level, which is not used, is ignored.
Read Committed (RC)

Snapshot read ignored, this article does not consider.

For the current read, theRC isolation level guarantees a read-to-record locking (record Lock), which has a phantom-read phenomenon.
Repeatable Read (RR)

Snapshot read ignored, this article does not consider.

For the current read, theRR isolation level guarantees a locking (record lock) on the read record, while guaranteeing a lock on the read range, and a new record that satisfies the query condition cannot be inserted (GAP Lock).
Serializable

From MVCC concurrency control is degraded to lock-based concurrency control. Do not distinguish between the snapshot read and the current read, all the read operation is the current read, read the read-write lock (S lock), write lock (x lock).

Serializable isolation level, read and write conflicts, so the degree of concurrency drops sharply, so it is not recommended.

A summary of the problems associated with the database concurrency that will result from different isolation levels is as follows: Therefore, for different isolation levels, it is necessary to actively lock the transaction in order to avoid these concurrency problems.

3. Locking mechanism of the database

The basic theory of locks used in various large-scale databases is consistent, but there are differences in concrete implementation.

SQL Server emphasizes that the locks are managed by the system. When the user has a SQL request, the system analyzes the request, automatically in satisfying the lock condition and the system performance for the database with the appropriate lock, while the systems are often automatically optimized during operation, dynamic lock.

SQLite uses extensive locks. When a connection is to be written to the database, all other connections are locked until the write connection ends its transaction. SQLite has a lock table to help different write databases can be locked at the last minute, to ensure maximum concurrency.

MySQL database because of its own architecture, the existence of a variety of data storage engine, each storage engine for the application scenario characteristics are not the same, in order to meet the needs of their specific application scenarios, each storage engine locking mechanism for each of the specific scenarios to optimize the design, Therefore, the locking mechanism of each storage engine also has a big difference.

For the general user, through the system of automatic locking management mechanism can basically meet the requirements of the use. However, when it comes to writing, it is important to understand the isolation mechanism and the possible problems associated with concurrency by adding a lock mechanism to the transaction or SQL. For the deadlock of the database, the general database system will have a mechanism to unlock, generally do not cause the database paralysis, but the process of unlocking will cause a rapid decline in database performance, reflected in the program will cause the program's performance decline, and will cause the program some operation failure.

In the actual development, we should take full account of all possible concurrency, can not add the role of the lock, but also to ensure the correctness of data processing. Therefore, a deep understanding of the lock has very important practical significance.

3.1 Snapshot Read vs current Read

Multiple versions of Concurrency Control Protocol--MVCC (Multi-version Concurrency Control) The greatest benefit, I believe is also familiar: Read no lock, read and write does not conflict. In an OLTP application that reads and writes less, read-write conflicts are important, greatly increasing the concurrency of the system, which is why almost all RDBMS support MVCC at this stage.

In contrast to MVCC, is a lock-based concurrency control, lock-based Concurrency control.

In MVCC concurrency control, read operations can be divided into two categories: snapshot read (snapshot read) and current read. Snapshot reads, read the visible version of the record (possibly a historical version), without locking. The current read, read is the latest version of the record, and the current read returned records, will be added to the lock, to ensure that other transactions will no longer concurrently modify this record.

In a system that supports MVCC concurrency control, which read operations are snapshot reads? Which operations are currently read? Take MySQL InnoDB as an example:
Snapshot read: a simple select operation, which belongs to the snapshot read, without locking. (Of course, there are exceptions, which are analyzed below)
Current read: Special read operation, insert/update/delete operation, belongs to the current read, need to lock.
All of the above statements belong to the current read, reading the latest version of the record. Also, after reading, it is necessary to ensure that other concurrent transactions cannot modify the current record and lock the read record. In addition to the first statement, the read record plus S lock (shared lock), the other operation, plus X lock (exclusive lock). Note: The locking of this statement is the database completion.

3.2 Current Read lock

Why insert/Update/delete operations are categorized as current read? You can take a look at the following update operation in the database execution flow:

, you can see the specific process for an update operation. When update SQL is sent to MySQL, MySQL server reads the first record that satisfies the condition according to the Where condition, and then the InnoDB engine returns the first record and locks (current read). After the MySQL server receives this lock-up record, an update request is initiated and the record is updated. A record operation is completed, and the next record is read until there are no records that meet the criteria. Therefore, within the update operation, a current read is included. Similarly, the delete operation is the same. The insert operation is slightly different, simply, that the insert operation may trigger a conflict check for a unique key, and a current read.

Note : According to the interaction, for a current read SQL statement, InnoDB and MySQL server interaction is a one, so the lock is also a one-piece. Lock a record that satisfies a condition, return to MySQL Server, do some DML operations, and then read the next lock until the read is complete.

One of the principles of traditional RDBMS locking is 2PL (two-phase Lock): two-phase Locking. Relatively speaking, 2PL is easier to understand, said the lock operation is divided into two stages: lock phase and unlock phase, and ensure that the lock phase and the unlocking phase does not intersect. Below, still take MySQL as an example, to briefly look at the implementation of 2PL in MySQL.

As can be seen, 2PL is the locking/unlock divided into two completely disjoint stages. Lock phase: Lock only, do not put lock. Unlocking phase: Lock only, no lock.

Concurrent Programming (iv): Also talk about the lock mechanism of database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More