MVCC implementation principles of PostgreSQL, Oracle/MySQL, and SQL Server

Source: Internet
Author: User

MVCC implementation principles of PostgreSQL, Oracle/MySQL, and SQL Server
The relational database management system uses MVCC (Multiversion Concurrency Control multi-version Concurrency Control) to prevent write operations from blocking read operation Concurrency, MVCC is a mechanism that ensures that concurrent reads and writes do not conflict with each other by using multiple versions of data. Different databases have different implementations, which is also a headache for database systems, relational databases seem simple and convenient on the surface, and it is reassuring to operate using standard SQL statements. However, as the number of systems increases and concurrent users increase, the performance of the database may decrease, at this time, we may need to go from external fine-tuning to in-depth research on internal principles, and the internal concurrency principle of each database is different. If we have multiple different databases, different tuning methods are required, and the core database used as the production system becomes less reassuring. This article provides different implementations of internal MVCC for several popular databases on the market. MVCC has two different implementation methods. The first implementation method is to save multiple versions of data records in the database. When data of different versions is no longer needed, the garbage collector recycles these records. This method is adopted by PostgreSQL and Firebird/Interbase, and similar mechanisms are used by SQL Server. The difference is that the old version of data is not stored in the database, however, the new version of data is stored in tempdb, which is different from the master database. The second method only saves the latest version of data in the database. However, the old version of data is dynamically reconstructed when undo is used, this method is used by Oracle and MySQL/InnoDB. The following describes the specific database implementation mechanism. In PostgreSQL, when a row of records is updated, the new version (called tuple) of the row data will be created and inserted into the table. The previous version provides a pointer pointing to the new version, the previous version is marked as "expired" expired, but it remains in the database until the garbage collector is recycled. To support multiple versions, each tuple has the following additional data records: xmin-the ID of the transaction that inserts the update record and creates the tuple. xmax-the transaction that deletes the record or creates a new version of the tuple or deletes the record. This field is initially null. the transaction status is saved in the CLOG of $ Data/pg_clog. this table contains two bytes of the transaction status information, which may be in-progress, committed, or aborted. After a transaction ends, PostgreSQL does not roll back the database record changes undo. It only marks the transaction as aborted in CLOG. A PostgreSQL table may contain a lot of data such as aborted to exit the transaction. The Vacuum cleanup process provides garbage collection for the record version that expired/aborted exited. The Vacuum cleanup tool also deletes the tuple-related index items. The xmin of a tuple is valid and xmax is invalid. it is visible. "Valid" means "or committed or represents the current transaction ". to avoid repeated operations on CLOG tables, PostgreSQL maintains a status ID in tuple to indicate whether tuple is "known committed" or "known aborted ". oracle MVCC Oracle stores the old version in the rollback segment (that is, 'undo log'). A transaction ID is not a sequential number, but a series of numbers, these numbers point to the transaction slot in the header of the rollback segment ). Rollback segments allow new transactions to reuse storage and reuse the transaction slots used by committed or exited old transactions, this automatic reuse mechanism enables Oracle to use limited rollback segments to manage a large number of transactions. The header block of the rollback segment is used as a transaction table, which stores the transaction state, called System Change Number or SCN. Oracle is not the transaction ID of each record on the storage page, instead, it saves space by saving the array of unique transaction IDs recorded in each row on the page. Only the array offset of the record is saved, and each transaction ID is saved with a pointer, the last undo record that points to the transaction created on this page. Not only is table record stored in this way, but index record also uses the same technology, which is one of the main differences between Oracle and PostgreSQL. when an Oracle transaction starts, it will mark a current transaction state SCN. when reading a table or an index page, Oracle uses the SCN number to determine whether the page contains the effect of transactions that should not be known to the current transaction, oracle checks the status of the transaction by looking for the associated rollback segment header. However, in order to save time, it is the first time that the transaction is actually queried, after the query is completed, its status is recorded on this page to avoid further query. If the page is found to contain invisible transactions Oracle re-creates the old version of the page by undoing each such transaction impact. It scans records related to each transaction and applies these transaction effects to this page until all transaction effects are removed after they are applied, A new page created in this way is used to access the tuple. Record headers in Oracle: a record header does not grow and always has a fixed size. For non-cluster tables, the record header is 3 bytes, and one byte is used to store the identifier, one byte is used to show whether the record is locked (for example, if it is updated but it is not confirmed to submit committed), and one byte is used for column count. The MVCC of SQL Server uses the record version in the SQL Server database to implement Snapshot isolation and read and submit. Only databases that require this item must be enabled and incur corresponding costs. When a record is modified or deleted, the copy-on-write mechanism can be used to effectively start the version, row versioning-based transactions can effectively "view" various versions of data from the past to the present. The record version Row is stored in version storage, and resides in the tempdb database outside the primary database. More specifically, when a record in a table or index is modified, the New Record carries the "sequence_number" of the transaction on which the modification is executed ". the old version of the record will be copied to version storage. The new record contains a pointer pointing to the old record in version storage. If multiple long-running transactions exist, multiple versions of versions are required. Records in version storage may contain pointers pointing to earlier versions of the record. SQL Server version storage cleanup: SQL Server automatically manages the size of version storage and maintains a cleanup thread to ensure that the number of versions recorded in version storage is not too long and exceeds the requirement, for queries running under Snapshot isolation, version storage retains the recorded version until the transaction for data modification is completed, and all statements containing the transaction that need to modify data are completed, for SELECT statements running under Read Committed Snapshot isolation, a special record version is no longer needed and will be removed once the SELECT statement is executed. If tempdb does not have free space, SQL Server will call the clear function to increase the file size, provided that the configuration file is automatically increased. If the disk has no space, files cannot grow automatically, and SQL Server stops generating versions. If this happens, any snapshot query that needs to be read will fail due to space restrictions. In SQL Server, the record header is 4 bytes-two-byte record metadata (record type)-two bytes direct forward to the NULL bitmap in the record. this is the difference offset of the actual size of the record (Fixed Length Column. versioning tag-this is a 14-byte structure. It contains a timestamp and a pointer to version storage in tempdb. The timestamp here is trasaction_seq_number. To support a version operation, the time when version information is added to the record.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.