1009MySQL database InnoDB Storage Engine log roaming

Source: Internet
Author: User

00–undo Log
Undo log is for the atomicity of transactions, in the MySQL database InnoDB storage engine, also with the undo log to achieve multi-version concurrency control (abbreviation: MVCC).

-Atomicity of the transaction (atomicity)
All operations in a transaction are either complete or do nothing, and cannot do only a subset of the operations. If the process occurs during the execution
The error, to roll back (Rollback) to the state before the transaction started, as if the transaction had never been executed.

-Principle
The principle of Undo log is simple, in order to satisfy the atomicity of the transaction, before any data is manipulated, the data is first backed up to a place
(The place where the backup of the data is stored is called Undo Log). The data is then modified. If an error occurs or the user executes the
Rollback statement, the system can use the backup in the Undo log to restore the data to the state before the transaction began.

In addition to guaranteeing the atomicity of transactions, Undo Log can also be used to assist in the persistence of transactions.

-Persistence of the transaction (durability)
Once the transaction is complete, all modifications made to the database by the transaction are persisted to the database. To ensure durability, the database
The system will completely record the modified data to the persistent storage.

-Simplified process for atomic and persisted transactions with undo log
Suppose there is a, b two data, the values are 1, 2, respectively.
A. Start of the transaction.
B. Record a=1 to undo log.
C. Modify the A=3.
D. Record b=2 to undo log.
E. Modify the b=4.
F. Write the undo log to disk.
G. Write data to disk.
H. Transaction submission
There is an implied precondition: ' The data is read into memory first, then the data in memory is modified, and the data is then written back to disk '.

Both atomicity and persistence can be guaranteed at the same time because of the following characteristics:
A. Record the undo log before updating the data.
B. To ensure persistence, the data must be written to disk before the transaction commits. As long as the transaction is successfully committed, the data must be persisted.
C. Undo log must be persisted to disk before data. If the system crashes between g,h, undo log is complete,
can be used to roll back a transaction.
D. If the system crashes between a-f, the data is not persisted to disk. So the data on the disk remains in the state before the transaction begins.

Flaw: Writes data and undo log to disk before each transaction commits, which results in a large amount of disk IO and therefore low performance.

If you can cache data for a period of time, you can reduce IO to improve performance. However, this will lose the persistence of the transaction. This has led to the introduction of another
mechanism to achieve persistence, i.e. redo Log.

01–redo Log

-Principle
In contrast to undo log, Redo log is a backup of the new data. Just persist the redo log before committing the transaction,
You do not need to persist data. When the system crashes, the redo log is persisted, although the data is not persisted. System can be based on
Redo log content to restore all data to the latest state.

-Undo + Redo Transaction Simplification process
Suppose there is a, b two data, the values are each.
A. Start of the transaction.
B. Record a=1 to undo log.
C. Modify the A=3.
D. Record a=3 to redo log.
E. Record b=2 to undo log.
F. Modify B=4.
G. Record b=4 to redo log.
H. Write redo log to disk.
I. Transaction Submission

-Features of Undo + Redo Transaction
A. To ensure persistence, the redo log must be persisted before the transaction is committed.
B. The data does not need to be written to the disk before the transaction commits, but is cached in memory.
C. Redo Log guarantees the persistence of the transaction.
D. Undo Log guarantees the atomicity of the transaction.
E. There is an implied characteristic that data must be written to persistent storage later than redo log.

-IO Performance
The design of Undo + redo is primarily concerned with improving IO performance. Although the data is cached, the IO to write data is reduced.
But it introduces a new IO, that is, the IO that writes redo log. If the IO performance of redo log is not good, it can not be the purpose of lifting high performance.
In order to ensure Redo log can have better IO performance, InnoDB's Redo log design has the following features:

A. Try to keep the redo log stored in a contiguous space. Therefore, the space of the log file is fully allocated when the system is first started.
Record redo log in sequential append mode to improve performance through sequential IO.
B. Bulk write logs. Instead of writing directly to the file, the log writes redo log buffer first. When you need to flush logs to disk
(such as transaction commits), writing many logs together to disk.
C. Concurrent transaction sharing redo log storage space, their redo log in the order of execution of the statement, sequentially alternating records together,
To reduce the space occupied by the log. For example, the record in Redo log might be something like this:
Record 1: <trx1, insert ...>
Record 2: <trx2, update ...>
Record 3: <trx1, delete ...>
Record 4: <trx3, update ...>
Record 5: <trx2, insert ...>
D. Because of C, when a transaction writes redo log to disk, the log of other uncommitted transactions is also written to disk.
E. Redo log only for sequential append operations, and when a transaction needs to be rolled back, its Redo log record does not
Delete the Redo log.

02– Recovery (Recovery)

-Recovery Strategy
The redo log is also logged when the uncommitted transactions and rolled-back transactions are made, so that when the recovery occurs, these transactions are subject to special
The processing of the. There are 2 different recovery strategies:

A. When recovering, only the transactions that have already been committed are re-made.
B. When recovering, redo all transactions include uncommitted transactions and rolled back transactions. Then roll back those with undo log
Uncommitted transactions.

-recovery mechanism of INNODB storage engine
The MySQL database InnoDB storage engine uses a B policy, and the recovery mechanism in the INNODB storage engine has several features:

A. When you redo redo log, you do not care about transactional. When recovering, there is no begin and no commit,rollback behavior.
Nor do you care which transaction each log is. Although transaction-related content such as transaction ID will be credited to redo Log, the content is only treated as
Part of the data to manipulate.
B. Using the B policy, you must persist the undo log, and you must write the corresponding undo log to disk before writing redo log.
This association of undo and redo log makes persistence complicated. To reduce complexity, innodb the undo log as
Data, the operation to record the Undo log is also recorded in the Redo log. This way, undo log can be cached like data.
Instead of writing to the disk before redo log.
The redo log that contains the undo log operation looks like this:
Record 1: <trx1, Undo log Insert <undo_insert ...>>
Record 2: <trx1, insert ...>
Record 3: <trx2, Undo log Insert <undo_update ...>>
Record 4: <trx2, update ...>
Record 5: <trx3, Undo log Insert <undo_delete ...>>
Record 6: <trx3, delete ...>
C. Here, there is one more question that is not clear. Since redo is not transactional, would it not be re-executing the transaction that was rolled back?
That's true. At the same time, InnoDB also records the operation of the transaction when it is rolled back into the redo log. Rollback operations are essentially
The data is modified so that the operation of the data during rollback is also recorded in the Redo log.
A rollback of the transaction's redo Log, which looks like this:
Record 1: <trx1, Undo log Insert <undo_insert ...>>
Record 2: <trx1, insert a...>
Record 3: <trx1, Undo log Insert <undo_update ...>>
Record 4: <trx1, update b...>
Record 5: <trx1, Undo log Insert <undo_delete ...>>
Record 6: <trx1, delete c...>
Record 7: <trx1, insert c>
Record 8: <trx1, update B to old value>
Record 9: <trx1, delete a>
A transaction that has been rolled back is redo and then undo, so it does not break the consistency of the data.

03– the contents of the log

-What is the data
From different perspectives and levels, we can view the data in the database as:
A. Relational data
B. Tuples or objects
C. There is a binary sequence in the page

Therefore, the log can also record different content:
-Physical log (physical log)
A. Record the full page
B. Record the portion of the page that has been modified (offset, content, and length in the page).

Pros: Because recovery is completely independent of the content on the original page, persisting data is not required to remain in a consistent state.
For example, when writing a page to disk, the system fails, one of the data on the page is written to the disk, and the other part is lost.
The correct data can still be recovered at this point.

Cons: Log records have a lot of content, taking up a great deal of space. such as B-tree's split operation, to record about a full page of content.

-Logical log (Logical log)
Records a tuple operation on a relationship (table).
A. Insert A row of records.
B. Modify a row of records.
C. Delete a row of records.
Logical logs are much more concise than physical logs. And it takes up a lot less space.
However, the logical log has 2 drawbacks:
A. Partial implementation
For example, table T has 2 indexes, and when you insert 1 records into T, you need to insert a record into the 2 b-tree respectively.
It is possible that the first B-tree insert succeeded, but the second b-tree did not insert successfully. In the recovery or
These special cases need to be handled when rolling back.
B. Operational consistency issues
An insert operation has a b-tree split, half of page A's data is moved to page B, page A is written to disk, and page B is not written to disk.
If a failure occurs and a recovery is required, the logical log is difficult to handle.

The issue of ' partial execution ' on the logical log is better maintained, but the ' consistency ' problem is very complex to maintain.

-Physical and logical combination of logs (physiological log)
This kind of log combines physical and logical logs to take its benefit and harm it. So as to achieve a relatively better state. This type of log has 2 features:
A. Physical to page. Subdivide the action to the page level. Logs are logged separately for actions on each page.
For example, if an insert is inserted on a node of 2 B-tree, a log is logged for each page's operation respectively.
B. A logical log is used within the page. For example, when inserting a record into a b-tree page, physics says to modify the page header's
Content (for example, the number of records in the page to add 1), to insert a row of data to a location, to modify the adjacent records in the linked list pointer, to modify the slot
attributes, and so on. Logically, a row of records is inserted within the page. Therefore, the logical log within the page only records: ' This is a
Insert operation ' and ' the contents of this line of data '.

The redo log of the MySQL database InnoDB storage engine is the combination of this physical and logical log.
Use logical logs within the page to reduce the amount of space the log occupies. But is it still a logical log, and the 2 questions mentioned above can be avoided?
A. The partial execution of the page can be considered non-existent. Because the operation of the entire page is atomic, it is not written until it is finished
to the disk.
B. The problem of operational consistency persists. Failure to write a page to disk may cause the page header to be remembered
The record was added 1, but the data was not flushed to disk, and the data on the page was inconsistent.

Fortunately, the problem is narrowed down to a page range, so it's easier to solve. A double write method in the InnoDB storage engine
To solve the problem.

-Double Write
The idea of Double write is simple:
A. Before overwriting the data on the disk, write the contents of the page to a different place on the disk (Doublewrite in the InnoDB storage engine
Buffer, where buffer is not a memory space, it is the space on the persistent storage).
B. Then overwrite the page's contents to the original data on the disk.

If the system fails at step a, the original data is not overwritten or intact.
If the system fails at step B, the original data is incomplete, but the new data has been completely written to doublewrite buffer.
As a result, a new page in Doublewrite buffer can be used to overwrite this incomplete page when the system is restored.

Double write obviously would have been added to the disk IO. Intuitively the number of IO increases by 1 time times, but the performance loss is not very large. Peter in
Innodb-double-write said the performance loss does not exceed 5-10%. This should be because bulk write is used in most cases.
A. Double write buffer is a contiguous storage space that can be written sequentially.
B. Double write has its own write buffer.
C. Write more than one page of Doublewrite to the buffer of memory, and then write to disk together.

Code in: buf0dblwr.cc
Buf_flush_write_block_low () call
Buf_dblwr_write_single_page () or Buf_dblwr_add_to_batch () to implement Doublewrite.

-Checksum
Detection of the consistency of the page is done by checksum, each page after the completion of the modification will count a page checksum.
This checksum is stored at the end of the page. Each time you read a page from disk to memory, you need to detect the consistency of the page.
The function buf_page_is_corrupted () is used to detect the consistency of the page.

-Log type for InnoDB Redo log
The format of the InnoDB redo log can be summarized as:
<space id>+<page no.>+< operation type >+< data;.

Redo Log Record page operations can be broadly divided into the following categories:
A. Write n bytes of content on the page, which can be considered a physical log.
Mlog_1byte, Mlog_2bytes, Mlog_4bytes, Mlog_8bytes, mlog_write_string
Changes to the pointer of various page lists, as well as the contents of the header, section pages, and so on, are recorded in this way.
B. Record actions on the page.
Mlog_rec_*, mlog_list_*, mlog_comp_rec_*, mlog_comp_list_*
These logs record the Inser, DELETE, update operation, and split merge operations on the B-tree page.
C. File and Page operations
Mlog_file_create, Mlog_file_rename, Mlog_file_delete,
Mlog_page_create, Mlog_init_file_page, mlog_page_reorganize
D. Undo Log operation
Mlog_undo_*
The operation of undo log in InnoDB is also credited to the redo log. Why to do so, in front of ' recovery ' has been said.

This refers only to the type of section redo log, which is fully defined in the Mtr0mtr.h file. With this type of definition, you can
It's easy to find where it's been used.

Although redo log subdivides the operation of the data to the page level. However, some operations on multiple pages are logically non-divisive.
such as B-tree's split operation, the parent node and 2 child nodes are modified. When recovery is performed, it is either fully restored or not
Recovery, you cannot restore only some of these pages. InnoDB through the mini-transaction (MTR) to ensure that these can not be divided
The atomic nature of the operation.

-InnoDB The log type of Undo log
The MySQL database InnoDB storage engine's undo log takes a logical log.
The format of InnoDB undo log can be summarized as:< operation type >+<table id>+< data;.

A. Deleting A row of records from A table
Trx_undo_del_mark_rec (the primary key is recorded in the log)
When you delete a record, the data is not actually deleted from the database, but is marked as deleted. The advantage of doing this is that
Undo log does not have to record the entire line of information. The operation also becomes simple when you undo.
B. Inserting a row of records into a table
Trx_undo_insert_rec (the primary key is recorded in the log)
Trx_undo_upd_del_rec (Close the primary key to the log) when there is a record in the table that is marked for deletion and the
When the data primary key is the same, the actual operation is to update the record that is marked for deletion.
C. Updating a record in a table
Trx_undo_upd_exist_rec (the primary key and the updated field contents are logged)
Trx_undo_del_mark_rec and Trx_undo_insert_rec, when the primary key is updated, the actual execution process
is to delete the old record and then insert a new record.

Because the Undo log is also used by MVCC and purge, there are special content records such as trx_id and Data_roll_ptr.
In the log. Trx_undo_insert_rec does not need to record these contents. Because a nonexistent data is not referenced in MVCC.
This is why the transaction separates the insert from the update, delete, and undo log. After the transaction commits, the insert's undo
The space occupied will be released immediately.

These types are defined in: trx0rec.h.
The logging process is in: Trx_undo_page_report_insert () and trx_undo_page_report_modify ().
The undo operation is in Row0undo.c, ROW0UINS.C, and ROW0UMOD.C, and the entry function is Row_undo ().

-Consistency issues with logical logs
The consistency of the logical log is very complex, and why does the undo log use logical logging?
Since redo log uses physiological logs and MTR, you can guarantee that the redo log will be re-completed upon recovery.
Data is consistent. When you do undo, you don't have to think about it.

04–checkpoint

Theoretically, if the MySQL database InnoDB storage engine buffer is large enough, you do not need to persist the data itself. Re-execute all redo logs again
You can recover all the data. But with the accumulation of time, Redo log will become very big. If every time from the first record
The recovery process is slow and cannot be tolerated. In order to reduce the time of recovery, the checkpoint mechanism is introduced.

-Dirty Pages (Dirty page)
If a data page has been modified in memory, it has not been flushed to disk. This data page is called a dirty page.

-The log sequence number (log Sequence numbers)
The LSN is the end point of each log in the log space, expressed as a byte offset. Used during checkpoint and recovery.

-Principle
Suppose at some point, all the dirty pages are flushed to disk. All redo logs before this point in time do not need to be re-
Did it. The system logs the end of the redo log at this point in time as checkpoint. At the time of recovery, from this
The location of the checkpoint can begin. The log before the checkpoint point is no longer needed and can be deleted. In order to
Using log space better, InnoDB uses the log space in a ring buffer (circular buffer).

Sharp Checkpoint

-Sharp Checkpoint
For busy systems, such a point of time is rarely seen. In order to create such a point in time, the simplest
The method is to stop all updates at a certain time until all the dirty pages are flushed to disk and checkpoint is logged.
Obviously, this approach is inappropriate for busy systems. Can not stop the user's operation when checkpoint?

-Fuzzy Checkpoint
As shown, if the user is still updating the data while the dirty page is being brushed, a dirty page before LSN1 may be removed before it is brushed to persistent storage
An operation after LSN1 has been modified. When checkpoint is complete, some operations after LSN1 (R1,R2 corresponding actions) are also
Persisted. When Sharp checkpoint is complete, the data stored in persistent storage is a snapshot of the memory data at a certain point in time.
When the Fuzzy checkpoint is complete, the data stored in persistent storage is not a snapshot of the memory data at an exact point in time. From some kind of
Extent, it can be said that data in persistent storage has lost consistency. This is a problem that must be addressed when recovering.

Fuzzy Checkpoint

-Idempotent (Idempotence) Rules
As shown, checkpoint is in the LSN1 position, and when checkpoint is completed R1,R2 corresponding modifications are also brushed to persistent storage.
Recover from LSN1 position, including R1, R2. Will the data be correct after re-execution?
The idempotent rule requires that the data is always correct, regardless of how many times the redo log is executed.
InnoDB redo log, which is the logical log in physics to Page,page.
Physical logs, natural support for idempotent rules. However, logical logs require special processing to support rules that satisfy idempotent.

-the latest (maximum) LSN of the data page
To satisfy idempotent rules, an LSN is recorded on each data page in InnoDB. Each time the data page is updated, the LSN is modified to
The LSN of the redo log for the current operation. On recovery, if the LSN of the data page is greater than or equal to the LSN of the current redo log, skip this
Log.

-Asynchronous checkpoint
After implementing idempotent rules, dirty pages can write persistent storage in any order, at any time. InnoDB's buffer pool has
A separate set of mechanisms to brush dirty pages. So in many cases checkpoint, the dirty pages are not written to the store. Just put all the dirty pages
The smallest LSN is recorded as checkpoint.
The implementation of checkpoint in LOG0LOG.C.
Log_checkpoint () implements asynchronous checkpoint.

-Synchronous checkpoint
The InnoDB buffer pool uses the LRU algorithm to determine which dirty pages should be written to persistent storage. If a minimum LSN is included
The page is updated frequently and it will not be brushed onto the storage. This could lead to the checkpoint point being unable to advance for a long time,
It even causes the log space to be fully occupied. This is the time to write a portion of the dirty pages to persistent storage in the order of the LSN from smallest to largest.
Log_checkpoint_margin ().
Log_calc_max_ages () is used to calculate, ' Determine whether to perform synchronous checkpoint ' parameters.

05– Cache pools (buffer pool)
Learning here, I am more inclined to say that this is a "redo+undo+buffer" model. To make IO performance, the dirty pages are cached in buffer,
Redo log also needs to be in memory first, Doublewrite also has memory buffer. The Buffer pool is critical in this pattern.

-Page Category
The page in Buffer pool is divided into three types:
A. Unused pages (blank buffer), which do not map to a data file in the page.
B. NET page, mapped to a data file page, and has not been modified. The content is the same as the page for the data file.
C. Dirty pages, mapped to a data file page, and the data has been modified. The content and data files are not the same page.

-LRU
InnoDB uses the LRU algorithm to determine which dirty pages should be written to disk first. When space is low, which net pages should be released.
So there are 2 LRU linked lists
A. Dirty page LRU list (BUFFER_POOL->LRU)
B. Net-page LRU list (UNZIP_LRU)

-Flush_list
When synchronizing checkpoint, dirty pages need to be written to persistent storage based on the order in which the data page was modified. So in addition to the LRU chain list,
Buffer pool also has a dirty page to modify the order of the list, called Flush_list. When you need to synchronize checkpoint,
The data is brushed to persistent storage according to the order of the pages in Flush_list.
A. A page appears only 1 times in flush_list because one page only needs to be written once.
B. In the order in which the pages were modified at the earliest time.

06–mini-transaction (MTR)
The previous mention of redo log subdivides the operation of the data to the page level. However, some operations on multiple pages are logically non-divisive.
The InnoDB uses mini-transaction to represent these logical operations that cannot be subdivided.

-The consistency of MTR
To meet the consistency of MTR, the following design was made by MTR:
A. All the logs of the MTR are encapsulated together and written redo log buffer together when the MTR commits.
There are 2 benefits to doing so:
* Reduce the concurrent MTR's competition for redo log buffer.
* Continuous storage together, the process of recovery is simpler.
B. InnoDB at the Redo log level, all logs in one MTR are used as the smallest unit of the redo log. At the time of recovery, an MTR
All logs in must be complete for recovery.

-Package of MTR logs
To distinguish different mtr,mtr in the log file, write Mlog_single_rec_flag or Mlog_multi_rec_end
Redo log (Mtr_log_reserve_and_write ()).
A. If there is only one row of records in the MTR log, add Mlog_single_rec_flag at the beginning of the log, indicating that the MTR has only
A record.
B. If there are multiple lines of records in the MTR log, add a log of type mlog_multi_rec_end at the end of the log,
The log representing the MTR ends here.

-LSN of MTR
A. The LSN is obtained because the log is written to redo log buffer. Therefore, when you modify the data, the LSN on the page is not modified.
Need to be modified uniformly after the MTR has obtained LSN.
B. An MTR has only one LSN. The LSN of all pages modified within a MTR is the same. So checkpoint will not appear in the middle of the MTR.
C. After the LSN has been obtained, if the dirty pages modified by the MTR are not in the flush_list of buffer pool, they will be added.

See Mtr_memo_slot_note_modification () and Buf_flush_note_modification ().

-Page level Lock
The practice of writing logs to redo log at the time of submission determines that the MTR will use page-level locks.
A. A page cannot be modified by multiple active MTR at the same time.
B. The lock on the data page in the MTR is not released until the MTR commits (the log is written to redo log buffer).

The lock object is stored in the memo of the MTR. When Mtr_s_lock and Mtr_x_lock are called to lock, the lock object is saved to the memo.
Unlock is done in Mtr_memo_slot_release ().

-Rollback of MTR
The MTR code found that the MTR did not record the undo log, nor rollback. The MTR is a very small operating unit, and each MTR
Have clear operational objectives, so it is easier to ensure their correctness.
A. Because the page operation is done in memory and the page has a fixed format, many page operations do not fail.
Many functions that write pages in the InnoDB storage engine do not return a value.
B. Before you operate on any page, check to see if errors can occur. If an error can occur, it cannot be executed down.
For example, when inserting a row of records into a B-tree node, first check that the page has enough space.
C. Use larger-grained locks (such as b-tree locks) and lock them in a certain order. This will not cause a deadlock problem.

The above is your own look at the code after the approximate impression, not necessarily mentioned on the punctuality. The code of the MTR module is simple, but the MTR has a large number of other modules
Use. To have a thorough understanding of the MTR, it is estimated that the code of other modules should be looked at, and most of the MTR operations will be organized.

1009MySQL database InnoDB Storage Engine log roaming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.