How is the atomicity and consistency of database transactions implemented? Go

Source: Internet
Author: User
Tags mysql in

The interesting thing about this question is not the question itself ("What is the atomic, consistent mechanism of implementation"), but the other question that the respondents ' differences reflect: What is the relationship between atomicity and consistency?

I pay special attention to the answer that I practice from the heart, he correctly pointed out, in order to ensure the atomicity of transactional operations, the log-based Redo/undo mechanism must be implemented. But the answer is still incomplete, because atomicity does not guarantee consistency entirely .

As I understand it, consistency is the most basic attribute in the ACID properties of transactions, and the other three attributes exist to ensure consistency.

First review the definition of consistency. Consistency refers to the fact that the data is in a meaningful state, which is semantically rather than syntactically . The most common example is the transfer. For example, from account A to transfer money to account B, if the money on account A is reduced, and the money on account B does not increase, then we think the data is in an inconsistent state at this time.

In the database implementation scenario, the consistency can be divided into the database external consistency and the consistency within the database. The former is guaranteed by an external application encoding, that is, an application must invoke the operation of account A and account B within the same transaction when performing the database operation of the transfer. If there is an error at this level, this is not the database itself can be solved, nor is it the scope we need to discuss. The latter is guaranteed by the database that a set of operations within the same transaction must all succeed (or fail all). This is the atomicity of transaction processing.

In order to achieve atomicity, it is necessary to log all update operations to the data, if a part of the operation has been successful, but subsequent operations, due to power outages/system crashes/Other hardware and software errors can not continue, through the backtracking log, has performed a successful operation revoked, so as to achieve " All operations failed "purpose. The most common scenario is when the database system crashes and restarts, when the database is in an inconsistent state, you must first perform a crash recovery process: Read the log for redo (replay all operations that have been performed successfully but have not yet written to disk to ensure durability), Undo all transactions that were not successfully committed at the time of the crash (undo all of the operations that were partially executed but not yet committed, guaranteeing atomicity). After the crash recovery is finished, the database is restored to a consistent state and can continue to be used.

The management and replay of logs is one of the most complex parts of the database implementation. If parallel processing and distributed systems are involved (log replication and replay are the basis for database high availability), this is much more complex than the scenario described above.

However, atomicity does not fully guarantee consistency. in the case of multiple transactions in parallel, even if the atomicity of each transaction is guaranteed, it can still result in inconsistent data. For example, transaction 1 needs to transfer $100 into account A: first read the value of account A, and then add 100 to that value. However, between these two operations, another transaction 2 modifies the value of account a, adding $100 to it. So the final result should be an increase of 200 yuan. In fact, after transaction 1 was finalized, account a only increased by $100, because the modification of transaction 2 was overwritten by transaction 1.

In order to ensure consistency in concurrency, isolation is introduced to ensure that the data that each transaction can see is always consistent, as if other concurrent transactions do not exist. In terms of the term, it is the state after multiple transactions are executed concurrently, and their state after the serial execution is equivalent. How to achieve isolation, has been answered by many people, in principle, is nothing more than two types of locks:

One is pessimistic locking, that is, the current transaction locks all objects involved in the operation and releases it to other objects when the operation is complete. To maximize performance, a variety of granularity has been invented (database-level/table-level/row-level ...). )/various properties (Shared lock/exclusive lock/Shared intent lock/Exclusive intent lock/shared exclusive Intent lock ... ) of the lock. In order to solve the deadlock problem, a series of techniques such as two-stage lock protocol/deadlock detection are invented.

One is optimistic locking, that is, different transactions can see different historical versions of the same object (typically data rows) at the same time. If there are two transactions that modify the same data row at the same time, the conflict detection occurs at the later transaction commit. There are two implementations, one is to get the historical version of the data row through the log undo, and one is to simply save multiple historical versions of the same data row in memory, separated by timestamps.

Locks are also one of the most complex parts of a database implementation. Similarly, when it comes to distributed systems (distributed locks and two-phase commits are the basis of distributed transactions), they are much more complex than the scenarios described above.

@ I practice from the sincere mention, the other respondents are actually the operating system to atomic understanding, that is, concurrency control. I can't agree with that entirely. The database has its own concurrency control and lock problem, although it is very similar in principle to the concept in the operating system, but not something on the same level. Locks in the database are completely different in granularity/type/implementation and in the operating system. The locks in the operating system are called latch in the database implementation (generally translated as latches). Other respondents answered "how to ensure data consistency in the case of parallel transaction processing".

Finally back to the original question ("What is the atomic, consistent implementation mechanism"). I have a database System concepts (4ed, a bit old), in the beginning of the 15th chapter Concise Introduction to the concept of acid and its relationship. If you want to get a conceptual understanding of its implementation, read the relevant chapters of the book and you should be able to figure it out. If you want to know the implementation in practice , you can find the source code of the open Source engine such as InnoDB to read. However, even a very rough open source implementation (regardless of the complexity of parallel processing, regardless of the distributed system, regardless of the operating system and hardware optimization, etc.), it is essential to understand that it is probably not a year or two.

Edited on 2015-11-16

1639 Reviews

Share

Collection Thanks for the collection

Shen Jie

PHP Development Engineer

53 people agreed with the answer

First borrow the words of the predecessor: database transactions have different isolation levels, different isolation levels for the use of locks are different, the application of locks eventually lead to the isolation level of different transactions.

Isolation is divided into four levels:
1 READ UNCOMMITTED: (READ UNCOMMITTED)
2 Read Committed (read Committed) The default isolation level for most databases
3 Repeatable Read (Repeatable-read) The default level of MySQL database
4 Serialization (SERIALIZABLE)

Four levels of specific implementations and different please read the following carefully:

First, the program can be executed concurrently, in MySQL, a table can be two or more processes at the same time to read and write data, there is no problem.

For example, there are two processes at this time to read the data, which is no problem, allow. But what happens if a process is reading data from one row, and the other is writing the data in the same row (changed, deleted)? Similarly, if two processes make changes to a row of data at the same time, whose change will prevail? The result will be what, dare not imagine, is not the data is destroyed. So this is a conflict.

Since the conflict will be resolved, by whom to solve, this time is by the lock mechanism to maintain. How do you use locks to keep them out of conflict?

At the beginning of the transaction can be prepared to prepare a write operation of this row of data plus an exclusive lock, if it is read operation, give the row data a read lock. Then, when the row data is modified, no other process is allowed to manipulate the row's data. While reading the row data, other processes cannot be changed, but can be read. When the read or write is complete, the lock is released and the last commit is committed. At this time to read and write to leave, write and write also separated away.
Note: The process of locking and releasing locks is maintained by the MySQL database itself and does not require our human intervention. The MySQL developer gave the conflict resolution a name called: READ Uncommitted: (readuncommitted). This is the first isolation of a transaction.

But this level of isolation is just not enough. Look at the following test results:


1) A Modify transaction level is: READ UNCOMMITTED. and start the transaction, make a query to the user table


2) b transaction update one record

3) At this time B transaction has not been committed, a within the transaction to do a query, found that the query results have changed

4) b for transaction rollback

5) A To do another query, the results of the query turned back

It was learned from the experiment that I changed one row of data in a process transaction, but after I modified it, I freed the lock, and another process read the data, and the previous transaction was uncommitted until I rolled back the data, and the other process read the data into useless or wrong data. We usually call this data dirty data, which is called 賍 read data.

What to do? Still depends on the locking mechanism. The lock is a different position, before it is as long as the operation of the data immediately release the lock, now is the release of the lock to the location of the transaction after the commit, at this time before the transaction commits, the other process is unable to read the row data, including any operations. Then the database operation rules for this state are given a name: Read Committed, or can be called non-repeatable read. This is the second isolation of a transaction.

In some cases, non-repeatable reading is not a problem, for example, we query a data multiple times and of course the results are the result of the final query. However, in other cases, problems can occur, for example, the same data A and B in turn, the query may be different, A and B may play up ...

Continue to see the following test results:


1) Set the isolation level to read-committed (read the submission) setting A's transaction isolation levels and go to the transaction to make a query

2) b Start the transaction and modify the record

3) A then query the user table to find that the records are not affected

4) B Commit Transaction

5) A again to the User table query, the discovery record is modified

Experiment here, you will find that the final result is inconsistent if you read the same data two times in the same transaction. Here we call this phenomenon: non-repeatable reading. Because after the first transaction has read the data, at which point the other transaction has modified the data, the transaction commits, then the other transaction is not the same at the time of the second read, a modified one and a modified one.

But careful you will find that since you say that this kind of isolation is the release of the lock after the transaction is committed, then during the experiment, the other transaction is still readable before the data is submitted. Did I say it wrong? No, here MySQL uses a concurrency versioning mechanism, they call it MVCC, the popular meaning: MySQL in order to improve the concurrency of the system, before the transaction is not committed, although the transaction operation of the data is locked state, but another transaction can still read, The default for most databases is this level of isolation. But MySQL is not.

And not only does this issue occur when updating data, but it still causes a similar phenomenon when data is inserted: While MySQL locks the rows of data being manipulated, it still does not prevent another transaction from inserting new rows into the table. For example: one transaction reads or updates all rows in the table, and another transaction inserts a new row into the table after the transaction commits. When a transaction that reads or changed data is read the same data for the second time, the number of result set rows that are read two times in this transaction is different. The original updated all the lines, and now read out that there is still a row is not updated. This is called Phantom Reading.

To prevent inconsistencies in the two read data in the same transaction (including non-stress and phantom reading), how do I proceed?!

MySQL still takes MVCC concurrency versioning to solve this problem. Specifically, if the same data is read more than once in a transaction, MySQL will still have the option to read the latest commit transaction when it is first read, and after the first reading, MySQL takes the data that was read for the first time as a result. This guarantees the consistency of the data when the same transaction is read multiple times. At this point, MySQL calls this solution: repeatability (Repeatable-read), which is the third barrier written above, and the default isolation level for MySQL.

Note: Both Phantom and non-repeatable reads (read Committed) read another transaction that has already been committed (this is different from the dirty Read), and the difference is that the non-repeatable read query is the same data item, and the Phantom read is for a batch of data (such as the number of data).

Speaking of which, is it really finished? In fact, MySQL does not completely solve the data consistency problem. Just on the reading of the hands and feet, solve the traditional sense of magic and non-repeatable reading.
Example: 1 A transaction is on, b transaction is turned on.
2 b The transaction inserted a piece of data into the table, but it was not submitted.
3 A transaction began to query, and did not find the B transaction data inserted this time. The data is then submitted at this point in the B transaction.
4 as a matter of fact, a transaction does not have this data, it begins to add this data, but it finds that there is a data duplication conflict.

At the end of the day, our last isolation level is also the highest isolation level: Do not serialize (serializable).
The isolation level automatically locks the data in the entire table that you want to manipulate, and if another process transaction wants to manipulate any data in the tables, it needs to wait for the lock to complete the release lock by the process operation. The occurrence of dirty reading, non-repeatable reading and phantom reading can be avoided. Of course, performance degrades a lot and causes many processes to queue up to compete with each other.

PostScript: The above mentioned four kinds of isolation lock mechanism application is the database automatic completion, does not need human intervention. The settings for the isolation level are valid only for the current link. For a MySQL command window, a window is the equivalent of a link, and the isolation level set by the current window is only valid for transactions in the current window

https://www.zhihu.com/question/30272728

How is the atomicity and consistency of database transactions implemented? Go

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.