[Reprint] Do you really understand the isolation of transactions?

Source: Internet
Author: User

Original: http://mp.weixin.qq.com/s?__biz=MjM5MjIxNDA4NA==&mid=400262409&idx=1&sn= 827bdcde075ef2f96c82a3f11fa422d4&scene=0&key= B410d3164f5f798ea52ee30c99dcb9c7e079c79659c2ffd9927976ab0093eb3003db7a80ff517fb25e7d1f4d95c602ad&ascene=0 &uin=mjk1odmyntyymg%3d%3d&devicetype=imac+macbookpro11%2c4+osx+osx+10.11.1+build (15B42) &version= 11020201&pass_ticket=r6pckpge0e2s4glv9xfkwcxltt2iyloosgjyv8lfo%2bbf7dcsh7cstqs%2bquqwzjaf

Isolation of transactions

Inside found that some users are completely unacceptable to the MySQL InnoDB lock mechanism because they are mostly accustomed to the read Committed (RC) Transaction ISOLATION level of a database such as Microsoft SQL Server. But as Jim Gray pointed out in his book Transaction processing:concepts and techniques, most of the databases do not actually meet the requirements of true isolation (true isolation). This paper will discuss the requirements of transaction isolation by combining theory and engineering.

Interviews usually ask the interviewer what the ACID properties represent, and even this has become one of the standard questions of the NetEase database interview. According to personal observation, most users of a, C, D attribute description is not too big problem, but isolation this point, often not accurate. This is because users generally think that a database supports transactions, so it should be acid-compliant and compliant with isolation requirements.

But let's see how the father of the transaction, Jim Gray, defines transactional isolation [1]:

isolation:concurrently executing transactions see the stored information as if they were running serially (one after Anot Her).

Concurrently executed transactions can see the stored information as if the transactions were executed serially. This may seem too theoretical, but as long as the three concurrency problems are resolved, there will be no loops between transactions, so you can achieve the requirements of true isolation. And these three questions Jim Gray summed it up as lost Update, read UNCOMMITTED, unrepeatable read:

The common RC Isolation level solves only the lost update and read UNCOMMITTED issues, and does not address the unrepeatable read problem, so RC does not meet the real requirements of isolation, and in some cases there may be some problems. Here inside the MySQL database to cite an example, assuming that table P has data 2, 4, 6, now has two transactions T1, T2 run under the RC transaction isolation level, then if the following steps to run:

Time

T1

T2

|

|

|

|

|

|

V

BEGIN;

DELETE from P WHERE a <= 6;

BEGIN

INSERT into P SELECT 3;

COMMIT;

COMMIT;

The data in the final table P is only 31 records, the contents of which are recorded in the database log:

INSERT into P SELECT 3;

DELETE from P WHERE a <= 6;

Then if the above log is sent to the slave (slave) to execute, you will find the slave table p record number is 0, that is, the master-slave data is inconsistent. In summary, it is not the same as the result of the serial execution of the transaction that occurred concurrently, that is to say, it is not consistent with the requirements of transaction isolation.

So how do you solve this problem in the MySQL database? If the isolation level of the transaction is RC, the log format must be set to row. Such a log format in inside, closer to the state machine mechanism, which also solves the problem of inconsistency. Both Oracle and Microsoft SQL server databases are based on the physical logical replication of pages, so there is no such problem. However, from the point of view of strict theory, RC does not conform to the requirement of real isolation of transaction.

Isolation level of a transaction

Database vendors must know the real requirements of isolation, but they have compromised to some extent, resulting in four different levels of isolation (isolation degree), degree 0, degree 1, Degree 2, degree 3, respectively allowing lost The problem with Update, read UNCOMMITTED, unrepeatable read exists, and degree 3 is the true requirement for isolation.

While the ANS SQL standard does not define isolation, it defines the level of isolation for transactions, and defines the three major concurrency problems that are addressed by different transaction isolation levels:

Isolation level

Dirty Read

unrepeatable Read

Phantom Read

Read UNCOMMITTED

YES

YES

YES

READ COMMITTED

NO

YES

YES

READ Repeatable

NO

NO

YES

SERIALIZABLE

NO

NO

NO

From the table above you can see different meanings between isolation levels and isolation. So why are each database vendor more inclined to choose the RC transaction isolation level? because generally, the lower the transaction isolation level, the less time is required to hold the lock, and the better the concurrency performance.

But the above conclusions are not certain, Jim Gray in his book also said that Serializable's transaction isolation level may have better performance in some cases. It is also mentioned in Inside's "MySQL Technology Insider: InnoDB Storage Engine" book. Then some serious little friends will ask, in what case, serializable transaction isolation level will have better performance? This will be specified in the next article of inside, because to understand this problem first of all to explain the implementation of the lock.

The InnoDB storage engine for the MySQL database is the engine that supports transactions, and its default transaction isolation level is repeatable READ (RR). The ANSI SQL standard RR transaction isolation level is degree 2.9999, but unlike the ANSI SQL standard, the INNODB storage engine solves the Phantom read problem at the transaction isolation level of the RR to achieve the isolation requirements of degree 3. Thus achieving the requirement of true isolation. In contrast to other databases, the transaction isolation level must be set to serializable to achieve true transaction isolation requirements. In other words,the default transaction isolation level of MySQL InnoDB can be understood as the seriablizable level of other databases .

However, in a strict sense, the implementation of the INNODB RR transaction isolation level is somewhat different from the traditional serialiazable transaction isolation level, which leads to a feeling of consternation or unacceptable to the user in certain scenarios, such as a unique index column that allows duplicate values to exist in a transaction. However, this does not undermine the consistency of the transaction, as long as the understanding of the InnoDB storage engine lock and MVCC implementation, in fact, some strange phenomenon is well understood.

Snapshot transaction Isolation Level

The classic serialiazable transaction isolation level is implemented with a strict two-phase lock (Strict two-phrase lock, short name: STPL), which is the method Jim Gray mentions in his book. However, because both read and write need to be locked, so that in most cases the performance of the transaction is inferior to the transaction isolation level like RC.

To address performance issues, a growing number of databases have recently started to support the snapshot transaction isolation level (SI), such as Oracle, PostgreSQL, Microsoft SQL Server database, and so on. The SI transaction isolation level seems to solve the dirty read, unrepeatable read, and Phantom Read issues. Unfortunately, it still does not meet the requirements of true isolation, and there is a write skew exception to the problem. At the SIGMOD conference in 2008 [3], a serializable Snapshot isolation (SSI) was proposed, which solved the problem of previous SI transaction isolation levels. PostgreSQL9.1 version based on this paper, SSI transaction ISOLATION level is implemented and optimized accordingly. In addition, at the VLDB conference in 2012, PostgreSQL published a corresponding paper [4], and interested readers can continue to study.

The snapshot-based transaction isolation level (SI or SSI) is considerably higher than the classic serializable transaction isolation level, but there are two issues that cannot be overlooked. One is that it causes a "bad" rollback, because its strategy is to ensure that it is correct, although it is sometimes possible to manslaughter some non-problematic transactions. The second is that support for large transactions requires additional memory guarantees, and if the amount of data modified is particularly large, this can cause a memory overflow problem to occur. However, SSI may be a future direction for the development of a default transaction isolation level. Expect MySQL database to be supported as soon as possible.

Reference documents

    1. Jim Gray, Http://amturing.acm.org/info/gray_3649936.cfm, 1998

    2. Jim Gray, Andreas Reuter, Transaction processing-concepts and Techniques, 1993, Morgan Kaufmann, ISBN 1-55860-190-2

    3. Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. Serializable isolation for snapshot databases. In Sigmod ' 08:proceedings of the $ ACM SIGMOD International Conference on Management of data, pages 729–738, New York, NY, USA. Acm.

    4. Dan R. K. Ports and Kevin Grittner. Serializable Snapshot isolation in PostgreSQL. Proceedings of the VLDB Endowment Vol. 5 (a), August (pp. 1850--1861)

    5. Https://wiki.postgresql.org/wiki/Serializable#SSI_Algorithm

    6. Kang, MySQL technology insider: InnoDB Storage engine, mechanical industry press, 2011

    7. Kang, MySQL kernel: InnoDB storage Engine Volume 1, Electronics industry Press, 2014

[Reprint] Do you really understand the isolation of transactions?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.