The implementation of "shallow into deep" MySQL transaction

Source: Internet
Author: User
Tags mutex mysql in postgresql volatile

In a relational database, the importance of transactions is self-evident, as long as the database slightly understand that the transaction has four basic properties of ACID, and we do not know how the database is the implementation of these four properties, in this article, we will analyze the implementation of the transaction, Try to understand how the database implements the transaction, and of course we will simply introduce the implementation of the ACID in MySQL in the article.

A transaction is actually the basic unit of concurrency control; I believe we all know that a transaction is a sequence operation in which the operation is either executed or not executed, it is an indivisible unit of work; The four properties of the acid in database transactions are the basis of the transaction and how acid is implemented We also cleared the implementation of the transaction, and then we'll go through how the database implements these four features in turn.

Atomic Nature

In learning a business, often someone will tell you that a transaction is a series of operations, or all the execution, not to execute, which is actually the atomic characterization of the transaction, although the transaction is atomic, but the atomicity is not only related to the transaction, its figure in many places will appear.

Because the operation is not atomic and can be subdivided into multiple operations, the entire operation may not continue to execute when an error or exception is thrown, and the side effects of an operation that has already been performed can result in the loss or error of data updates.

The transaction is actually not much different from an operation, it is a set of database operations (which can be understood as SQL), if the transaction is not atomic, then there is no way to ensure that all operations in the same transaction are executed or not executed, the entire database system is neither available nor trustworthy.

Rolling back logs

To guarantee the atomicity of a transaction, it is necessary to roll back the actions that have been performed when an exception occurs, whereas in MySQL, the recovery mechanism is implemented by rolling back the log (undo log), and all changes to the transaction are logged to the rollback log first. The corresponding row in the database is then written to.

This process is very well understood, in order to be able to undo all previous operations in the event of an error, it is necessary to record the previous actions, so that the error can be rolled back.

In addition to the ability to provide rollback-related information in the event of an error or user execution, the rollback log is able to ROLLBACK crash the entire system, kill the database process directly, and, when the user starts the database process again, can immediately rollback the previously unfinished transaction by querying the rollback log. This also requires that the rollback log must precede data persistence to disk, which is the main reason why we need to write the database first.

Rolling back the log does not physically restore the database to the way it was before the statement or transaction was executed; it is a logical log, and when the rollback log is used, it simply revokes the changes in the database according to the log, and it is understood that each of the articles we use in the transaction INSERT corresponds to a single bar DELETE , each UPDATEalso corresponds to an opposite UPDATE statement.

Here, we do not introduce the format of the rollback log and how it is managed, this article focuses on what it is exactly what kind of things, how to solve, how to solve the problem, if you want to understand the specific implementation of the details of the reader, believe that the network on the rollback log articles must be many.

Status of the transaction

Because the transaction is atomic, from a distance, the transaction is an inseparable whole, the state of the transaction is only three: active, commited, and Failed, the transaction is either in execution, or it is the state of success or failure:

But if we zoom in, we'll see that the transaction is no longer atomic, including many intermediate states, such as partial commits, and the state diagram of the transaction becoming more and more complex.

The state diagram of the transaction and the description of the status are taken from the 14th chapter of the Database System concepts.

    • Active: The initial state of the transaction, indicating that the transaction is executing;
    • Partially commited: After the last statement is executed;
    • Failed: After a transaction is found to be not executing properly;
    • Aborted: The transaction is rolled back and the database is restored to the state before the transaction is made;
    • Commited: Successful execution of the entire transaction;

Although the state of the entire database can be restored in the event of an error, if we perform a transaction such as printing a log to the standard output, sending a message to the outside, not modifying the contents of the disk through the database, or even transferring money during the execution of the transaction, There is no way for these operations to be rolled back as visible external output; These problems are resolved and handled by the application developer, and in the vast majority of cases we need to trigger a similar operation that cannot be rolled back after the entire transaction commits.

For example, if we make a request to a third party after the end of the transaction, we are very likely to lose the process of initiating the request, because the request to the third party and get the result is a longer event, if the database or server crashes when the transaction is just committed. This creates a very serious problem, which is not guaranteed by the database, and the developer needs to see if the request is initiated, whether the result succeeds or fails at the appropriate time.

The atomicity of parallel transactions

So far, all of the transactions have been executed serially and have not considered parallel execution, but in practice, the parallel execution of the transaction is the norm, but under the parallel task, there can be very complex problems:

When Transaction1 reads and id = 1 writes to the user during execution, but does not commit or rollback the modified content, Transaction2 reads the same data and commits the transaction, i.e. Transaction2 is dependent on Transaction1, when Transaction1 due to some errors need to rollback, because to ensure the atomicity of the transaction, we need to rollback Transaction2, but because we have submitted the Transaction2, So we have no way to roll back, in which case we have a problem, the Database System concepts this phenomenon called the non-recoverable arrangement (nonrecoverable Schedule), Under what circumstances can it be recovered?

A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously WRI Tten by TI, the commit operation of TI appears before the commit operation of Tj.

Simply understand that if Transaction2 relies on transaction Transaction1, the transaction Transaction1 must complete the commit before Transaction2 commits:

However, this is not complete, and as the number of transactions increases, the entire recovery process becomes more complex, and it is not so easy to recover from the mistakes of the transaction.

In the event shown, Transaction2 relies on Transaction1, and Transaction3 relies on Transaction1, and when Transaction1 is rolled back due to a problem with execution, in order to ensure the atomicity of the transaction, the Transaction2 and Transaction3 are all rolled back, which is also called cascading rollback (cascading Rollback), and cascading rollbacks can cause a lot of work to be withdrawn, which is unacceptable to us. However, if you want to achieve absolute atomicity, this thing has to be dealt with, we will explain how to deal with the atomicity of parallel transactions later in the article.

Durability

Since it is a database, there must be a strong need for persistent storage of data, if the data is written to the database, then the data must be stored securely on disk, and the persistence of the transaction is that once the transaction is committed, the data must be written to the database and persisted in storage.

Once a transaction has been committed, it cannot be rolled back again, and the only way to recall a transaction that has been committed is to create a reverse transaction that "compensates" for the original operation, which is also a manifestation of the persistence of the transaction.

Redo Log

As with atomicity, the persistence of a transaction is also accomplished by logging, and MySQL uses the redo log (redo log) to implement the persistence of the transaction, the redo log consists of two parts, one is the memory redo log buffer, because redo log buffer in memory, so it is volatile, The other is the redo log file on the disk, which is persistent.

When we try to modify the data in a transaction, it reads the data from the disk into memory, updates the cached data in memory, generates a redo log and writes to the redo log cache, and when the transaction is actually committed, MySQL flushes the contents of the Redo log cache to the redo log file. Then update the in-memory data to disk, and the 4th and 5 steps in the diagram are executed when the transaction commits.

In InnoDB, the redo log is stored as a 512-byte block, and because the size of the block is the same as the disk sector size, the redo log writes can guarantee atomicity and will not cause the redo log to write only half and leave dirty data because of a machine outage.

In addition to all modifications to the database, a redo log is generated because the rollback log is also required for persistent storage, and they create a corresponding redo log, which, after an error occurs, will find the log re-execution from the redo log that was not updated to the database disk in order to satisfy the persistence of the transaction.

Rolling back logs and redo logs

So far we've learned about two logs in MySQL, rollback log (undo log) and redo log (redo log); In a database system, the atomicity and persistence of transactions is guaranteed by the transaction log (transaction log), which is the two logs mentioned above when implemented , the former is used to revoke the impact of the transaction, which, in the case of error handling, will redo the committed transaction, which guarantees two points:

    1. A transaction that has an error or needs to be rolled back can be successfully rolled back (atomic);
    2. After the transaction commits, the data cannot be written before the disk is down and the data can be recovered successfully after the next reboot (persistence);

In the database, these two kinds of logs are often working together, and we can consider them as a whole transaction log, which contains the ID of the transaction, the modified row elements, and the values before and after the modification.

A transaction log contains both the value before and after the modification, can be very simple to roll back and redo two operations, here we will not do redo and rollback log expansion, may be in the later article on the recovery mechanism of the database system mentioned two kinds of log usage.

Isolation of

In fact, the author in the previous article "shallow into shallow out" MySQL and InnoDB has introduced the isolation of database transactions, but in order to ensure the independence and integrity of the article, we will also introduce the isolation of the transaction, the content may be slightly different.

The isolation of a transaction is one of the major foundations of database processing data, and if there is no isolation between transactions without a database, the cascading rollback mentioned in the atomic section of the parallel transaction can cause significant performance loss. If the order of execution of all transactions is linear, it is much easier to manage transactions, but allowing parallel execution of transactions can increase throughput and resource utilization, and can reduce the wait time per transaction.

When more than one transaction concurrently executes, the isolation of the transaction may be violated, although the execution of a single transaction may not have any errors, but the overall result is a database consistency problem, while serial can allow developers to ignore the impact of parallelism, can well maintain database consistency, However, it can affect the performance of transactional execution.

Isolation level of a transaction

Therefore, the database isolation and consistency is a need for developers to weigh the problem, to provide the database of what level of isolation determines the performance of the database and what consistency can be achieved, in the SQL standard defines the transaction isolation level of four kinds of databases: READ UNCOMMITED , READ COMMITED , REPEATABLE READand SERIALIZABLE ; the isolation level of each transaction actually solves a problem more than the previous level:

    • RAED UNCOMMITED: Using a query statement does not lock and may read UNCOMMITTED rows (Dirty read);
    • READ COMMITED: Only record locks are recorded, and no gap locks are added between records, so new records are allowed to be inserted near the locked record, so when you use query statements multiple times, you may get different results (non-repeatable Read);
    • REPEATABLE READ: Reading the same range of data multiple times will return a snapshot of the first query and will not return different rows of data, but Phantom reads may occur (Phantom read);
    • SERIALIZABLE: InnoDB implicitly adds a shared lock to all query statements, which solves the problem of Phantom reading;

All of the transaction isolation levels above do not allow Dirty writes (Dirty write), that is, the current transaction updates the data that another transaction has updated but not yet committed, and most of the databases use READ commited as the default transaction isolation level, but MySQL uses the Repeatable READ As the default configuration, and from RAED uncommited to SERIALIZABLE, the performance of the database for concurrent execution transactions is declining as the transaction isolation level becomes more stringent.

For the users of the database, in theory, it is not necessary to know how the isolation level of the transaction is implemented, we only need to know what the isolation level solves the problem, but the different database for different isolation level is the implementation details in many cases will let us encounter unexpected pits.

If the reader is not aware of what is dirty reading, non-repeatable reading, and phantom reading, you can read the previous article "shallow into shallow out" MySQL and InnoDB, where we only put a graph to show how each isolation level solves these problems.

Implementation of Isolation Levels

The implementation of the database for the isolation level is to use the concurrency control mechanism to control transactions performed at the same time, to limit the access and update of different transactions to the same resource, and the most important and common concurrency control mechanism, where we will briefly describe how the three most important concurrency controller mechanisms work.

Lock

Lock is one of the most common concurrency control mechanisms, in a transaction, we do not lock the entire database, but only those that need access to the data items, MySQL and the common database of the locks are divided into two kinds, shared locks (GKFX) and mutual Exclusion (Exclusive), the former is also called a read lock, The latter is called write lock.

Read locks Ensure that read operations can be executed concurrently, without affecting each other, while write locks ensure that no other transaction accesses or changes to the same record cause unpredictable problems when updating database data.

Time stamp

In addition to locks, the other way to implement the isolation of a transaction is through a timestamp, the database that implements the transaction in this way, for example, PostgreSQL retains two fields for each record, and the read timestamp errors The maximum timestamp in the transaction that accesses the record, while the record row's The timestamp of the transaction in which the record was changed to the current value is saved in the write timestamp.

When using time stamps to achieve the isolation of transactions, often use optimistic lock, first modify the data, write back to judge the current value, that is, whether the timestamp has changed, if not changed, write, otherwise, generate a new timestamp and update the data again, optimistic lock is actually not the real lock mechanism, It's just a thought, and it's not going to be introduced here.

Multi-version and Snapshot isolation

By maintaining multiple versions of the data, the database can allow transactions to read the old version of the data when it is updated by other transactions, and many databases implement this mechanism, because all read operations no longer need to wait for write lock release, so it can significantly improve read performance, MySQL and PostgreSQL has its own implementation of this mechanism, that is, MVCC, although the different ways of implementation, MySQL through the article mentioned in the Rollback log implementation of the MVCC, to ensure that transactions in parallel execution can not wait for the release of the mutex directly to obtain data.

Isolation and atomicity

Here it is necessary to briefly mention the Cascade Rollback encountered in the atomic section, if a transaction to write data, then get a mutex, the other transaction would like to get the row data read lock must wait for the write lock release, naturally there will be no cascading rollback and so on.

However, in most databases, such as MySQL, the use of features such as MVCC, that is, the normal reading method does not need to acquire locks, when you want to update the read data, you need to SELECT ... FOR UPDATE try to obtain the corresponding row of the mutex, to ensure that the different transactions can work properly.

Consistency

The author argues that database consistency is a very confusing concept, because the database domain actually contains two consistency, one is the consistency in ACID, and the other is consistency in the CAP definition.

The consistency of these two databases is not exactly a matter, many people have a very deep misunderstanding of the concept of the two, when we discuss the consistency of the database, we must be clear about the context of the semantics, as far as possible to ask whether we want to discuss the end of the ACID in the consistency or the consistency of the CAP.

ACID

The database's definition of consistency in ACID is this: If a transaction is atomically running independently in a consistent database, then after it executes, the state of the database must be consistent. For this concept, the first layer of its meaning is the constraints on data integrity, including primary KEY constraints, referential constraints and some constraint checks, and so on, before and after the execution of the transaction and the process will not violate the data integrity constraints, all the operation of the database write should be legitimate, and can not produce illegal data state.

A transaction must preserve database Consistency-if a transaction is run atomically in isolation starting from a consist ENT database, the database must again is consistent at the end of the transaction.

We can interpret a transaction as a function that accepts an external SQL input and a consistent database, which is bound to return a consistent database.

The second layer of meaning actually refers to the logic of the requirements of the developer, we want to write the correct transaction logic in the code, such as bank transfer, the logic in the transaction can not only deduct money or only add money, this is the application level of database consistency requirements.

Ensuring consistency for a individual transaction is the responsibility of the application programmer who codes the Trans Action. -Database System Concepts

Consistency in database ACID requires more than just checking for data integrity and legitimacy, but also the correctness of application-level logic.

The consistency of data in the CAP theorem is in fact the same value for the copy of the same data in each node in the distributed system, whereas the consistency in ACID refers to the rules of the database, and if a value must be unique in the schema, then a consistent system must ensure that in all operations, This value is unique, and as a result, CAP and ACID are fundamentally different in terms of the definition of consistency.

Summarize

The acid four basic characteristics of the transaction is to ensure that the database can operate the cornerstone, but fully ensure that the database of acid, especially the isolation will have a greater impact on performance, in the actual use we will also according to business needs to adjust the isolation, in addition to isolation, The atomicity and persistence of the database is believed to be a better understanding of the characteristics of the former to ensure that the database transactions are either all executed, or all do not execute, the latter to ensure that the database writes are persistent storage, non-volatile, and consistency is not only the database on the integrity of its own data requirements, but also to the developers to ask- Write logically correct and reasonable transactions.

Finally, and most importantly, when others will be consistent, be sure to understand his context, if you have questions about the content of the article, you can leave a comment in the comments.

References
    • Database System Concepts
    • Database transactions
    • How does MVCC (multi-version Concurrency Control) work
    • How does a relational database work
    • Implementing Transaction processing using Redo Logs
    • Implementing Transaction processing using Undo Logs
    • Undo/redo Logging Rules
    • MySQL decryption: InnoDB Storage Engine Redo log roaming
    • The difference between C in ACID and C in the CAP theorem
    • disambiguating ACID and CAP

SOURCE Link: "Shallow into deep" MySQL in the implementation of the transaction · Faith-Oriented programming

Follow:draveness GitHub

The implementation of "shallow into deep" MySQL transaction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.