Recently I have been doing the Order class project, using the transaction. Our database chooses MySQL, the storage engine chooses the INNODB,INNODB to have the good support to the transaction. In this article, we're going to pick up a piece of business-related knowledge.
Why should there be a business?
Transactions are widely used in order systems, banking systems and many other scenarios. If you have one of the following scenarios: a user and B user are depositors of the bank. Now a is going to transfer 500 dollars to B. Here are a few things to do:
1. Check the account balance of a >500 yuan; 2. A account deduction of 500 yuan; 3. b Account increase of 500 yuan;
Normal process to go down, a account deducted 500,b account plus 500, happy. What if the system fails after the money is deducted from a account? A lost 500 in vain, and B did not receive 500 of his own. In the above case, there is a precondition: a deduct money and B plus money, either succeed at the same time, or fail at the same time. This is where the need for business is.
What is a transaction?
Rather than defining a transaction, let's talk about the nature of the transaction. As we all know, transactions need to meet acid four characteristics.
- A (atomicity) atomicity. The execution of a transaction is treated as an indivisible minimum unit. Operations inside the transaction, either all successfully executed, or all failed to rollback, can not execute only part of it.
- C (consistency) consistency. The execution of a transaction should not break the integrity constraints of the database. If the system crashes after the 2nd operation in the above example, it is guaranteed that the sum of money for A and B will not change, also known as completeness.
- I (Isolation) isolation. Generally, the behavior between transactions should not affect each other. In practice, however, the degree of interaction between transactions is affected by the isolation level. The article will be detailed later.
- D (durability) persistence. After a transaction commits, the committed transaction needs to be persisted to disk. Even if the system crashes, the submitted data should not be lost.
Four isolation levels for transactions
As mentioned earlier, the isolation of transactions is affected by the isolation level. So what is the isolation level of the transaction?
The isolation level of a transaction can be thought of as the "selfish" degree of a transaction, which defines the visibility between transactions. The isolation level is divided into the following categories:
- READ UNCOMMITTED (not submitted). Under the isolation level of RU, transaction A changes the data, even if it is not committed, and is also visible to transaction B, which is called dirty reading . This is an isolation level with a low degree of isolation, which can cause many problems in practical use, and is generally not commonly used.
- Read COMMITTED (submitted for reading). Under RC isolation level, there is no problem with dirty reads. Transaction a changes the data, the commit will be visible to transaction B, for example, when transaction B is opened to read Data 1, then transaction a opens, the data is changed to 2, commit, B read the data again, will read the latest data 2. under the isolation level of RC, an issue with non-repeatable reads occurs. This isolation level is the default isolation level for many databases.
- REPEATABLE READ (Repeatable read). There is no problem of non-repeatable reads under the isolation level of RR. transaction A changes the data, and after it is committed, it is not visible to transactions that are opened before transaction a. For example, transaction B reads data 1 when it is opened, then transaction a opens, changes the data to 2, commits, and B reads the data again, still reading only 1. Under the isolation level of RR, a phantom read problem occurs. Phantom reading means that when a transaction reads a value in a range, another transaction inserts a new record within that range, and the previous transaction reads the value of the range again, reading the newly inserted data. MySQL default isolation level is RR, however MySQL's InnoDB engine gap lock successfully solved the problem of phantom reading.
- SERIALIZABLE (Serializable). Serializable is the highest isolation level. This isolation level forces all things to be executed serially, and at this isolation level, every row of data that is read is locked, resulting in a large number of lock expropriation problems with the worst performance.
To help understand the four levels of isolation, here's an example. 1, transaction A and transaction B are turned on, and data 1 is updated several times. Four small men open a transaction at different times, what values of data 1 might be seen?
The first villain, may read to any one between 1-20. Because of the uncommitted read isolation level, changes to the data by other transactions are also visible to the current transaction. The second villain may read 1, 10, and 20, and he can only read data that has been submitted by other transactions. The third villain reads the data to decide on the point in time when its affairs are opened. How much is read when the transaction is opened, and what is the value read before the transaction is committed. The fourth villain, only open between a end to B start, is possible to read the data, and during the execution of transaction A and transaction B is not able to read the data. Because the four villain reading data is required to lock, transactions A and B during execution, will occupy the data of the write lock, resulting in a fourth villain waiting for the lock.
Question: When the third villain starts execution from a execution, until transaction B commits, does it see 1 or 10, always 1? According to the submission read it seems that he can only read 1, but according to Repeatable read and can read the appearance of 10 ... What the hell is going on here?
In fact, it needs to be clear that the RR level must be read to the transaction opened when the transaction has been committed, the uncommitted is of course not see, otherwise there is no repeatable read, but this guarantee through the MVCC to achieve. (if there is an omission, look at the king)
Figure 2 lists the issues that are faced with different isolation levels.
Obviously, the higher the isolation level, the greater the resource consumption (lock) it brings, and the lower the concurrency performance. To be precise, there is no concurrency under the Serializable isolation level.
A transaction in MySQL
The implementation of a transaction is a database-based storage engine. Different storage engines do not have the same level of support for transactions. The storage engines that support transactions in MySQL are InnoDB and NDB. InnoDB is the default storage engine for MySQL, the default isolation level is RR, and is further under the isolation level of RR, which resolves non-repeatable read issues through the multiple version concurrency control (mvcc,multiversion Concurrency). Plus the Gap Lock (also known as concurrency control) to solve the Phantom reading problem. Therefore, the RR isolation level of InnoDB actually achieves the serialization level effect, and preserves the better concurrency performance. The isolation of transactions is achieved through locks, while the atomicity, consistency, and persistence of transactions are implemented through transaction logs. When it comes to transaction logs, you have to say redo and undo.
Redo Log
In the InnoDB storage engine, the transaction log is implemented by the Redo (redo) log and the InnoDB storage engine's log buffers (InnoDB log buffer). When a transaction is turned on, the operations in the transaction are written to the storage engine's log buffer, which needs to be flushed to disk in advance before the transaction commits, which is what DBAs often call "log-Ahead" (Write-ahead Logging).
When a transaction is committed, the mapped data file in buffer pool is slowly flushed to disk. At this point, if the database crashes or goes down, then when the system restarts for recovery, the database can be restored to a pre-crash state based on the logs recorded in the Redo log. Incomplete transactions can either proceed with the commit or rollback, which is based on the policy of the recovery.
When the system starts, it has allocated a contiguous storage space for the redo log, records the redo log in sequential append, and improves performance through sequential IO. All transactions share redo log storage space, and their redo log is sequentially recorded in the order in which the statements are executed. Here's a simple example:
Recording1:<Trx1,Insert... > record 2:< trx2delete ... > record 3:< trx3update ... > record 4:< trx1update ... > record 5:< trx3insert ... >
Undo Log
Undo log is primarily a rollback service for a transaction. In addition to logging redo log, a certain amount of undo log is logged during the execution of a transaction. Undo log records the state of the data before each operation and can be rolled back based on the undo log if a rollback is required during the execution of the transaction. A rollback of a single transaction rolls back only the operations that are done by the current transaction and does not affect the operation of other transactions.
The following is a simplified procedure for undo+redo transactions:
Suppose there are 2 values, A and B, respectively, and a value of
1.StartTransaction;2.RecordingA=1ToUndoLog;3.UpdateA=3;4.RecordingA=3Toredo log5 record b=2 to Undo log6update b = 4 7 record b = 4 to Redo log8 redo log flushed to disk 9commit
In any one step of 1-8 the system goes down, the transaction is not committed, and the transaction does not have any effect on the data on the disk. If the outage is between 8-9, you can choose to roll back after recovery, or you can choose to continue the transaction commit because redo log is persisted at this time. If the system goes down after 9 and the data changed in the memory map is not too late to brush back to the disk, then after the system recovers, the data can be brushed back to disk according to redo log.
So, redo log actually guarantees the persistence and consistency of the transaction, while the undo log guarantees the atomicity of the transaction.
Distributed transactions
There are many ways to implement distributed transactions, which can either adopt the native transaction support provided by InnoDB, or use Message Queuing to achieve the eventual consistency of distributed transactions. Here we mainly talk about InnoDB support for distributed transactions.
, MySQL's distributed transaction model. The model is divided into three blocks: application (AP), Resource Manager (RM), transaction manager (TM).
- The application defines the boundary of the transaction and specifies which transactions need to be done;
- The resource Manager provides a way to access transactions, usually a database is a resource manager;
- The transaction manager coordinates each transaction in the global transaction.
A distributed transaction takes a two-stage commit (two-phase commit).
- The first stage all the transaction nodes begin to prepare, telling the transaction manager ready.
- The second stage of the transaction manager tells each node whether it is a commit or a rollback.
If one node fails, the global node is required to rollback all the nodes in order to guarantee the atomicity of the transaction.
Summarize
When do I need to use a transaction? I think that as long as the business needs to satisfy the acid scenario, it needs the support of the transaction. Especially in the order system, the banking system, business is indispensable.
This article focuses on the nature of transactions and the support of MySQL InnoDB for transactions. Business-related knowledge far more than the text said, this article only to make a point, the shortcomings of the readers are also looking for a lot of apologies.
Reference documents:
- The third edition of high-performance MySQL
- "MySQL Technology insider InnoDB storage Engine"
"MySQL" Ramble about the affairs in MySQL and its implementation