Innodb Crash Recovery
InnoDB has two very important logs, one is undo log, the other is redo log, which is used to guarantee the atomicity of the transaction and the MVCC of InnoDB, which is used to ensure the persistence of the transaction. Like most relational databases, InnoDB records physical changes to the data file and ensures that the log is always first, the so-called Wal, which ensures that the previous redo log is written to disk before persisting the data file.
Crash Recovery is a feature of the InnoDB engine, when the failure occurs, after restarting the service, will automatically complete the recovery operation, restore the database to a previous normal state, the recovery process will complete two steps
The first step: Check the Redo log to redo all the previously completed and committed transactions;
Step two: Undo log, uncommitted transactions, all canceled
Lsn
The LSN (log sequence number) is used to record the log ordinal, which is a continuously incrementing unsigned long long integer. In InnoDB's log system, LSN is ubiquitous, which is used both to represent the log sequence number when a dirty page is modified, and also to record checkpoint, which, through the LSN, can be specifically located in the Redo log file.
To manage the dirty pages, the page on each instance in Buffer Pool is maintained on a flush List,flush list, sorted by the LSN number in which the page was modified. So when doing redo checkpoint periodically, the LSN of choice is always the oldest page on the flush list of all BP instance (with the smallest LSN). Because of the Wal strategy, the redo log is persisted every time a transaction commits to ensure that the transaction is not lost. The delayed Brush Dirty page has the effect of merging multiple modifications to avoid the performance problems caused by frequent writing of data files.
Dirty page
In InnoDB, the dirty page inside the buffer pool can speed up data processing while also causing inconsistencies (RAM vs DISK)
1. The InnoDB engine loads data and indexes into the buffer pool in memory every time a user has a request for data changes, and if each modification of the data and index needs to be updated to disk, the I/O request must be greatly increased
Because the location of each update is random, the head needs frequent positioning resulting in low efficiency, and the data is temporarily in memory, which also improves the reading speed to a certain extent. So InnoDB only adds one log after each request (Transaction), and one thread is responsible for intelligently reading log files and bulk updating to disk for the most efficient disk writes. InnoDB since the use of mem buffer to improve the corresponding speed, it will certainly bring inconsistent data, the term is dirty data, MySQL called Dirty page.
Process: When a transaction (Transaction) needs to modify a record (row), InnoDB needs to read the page from disk to buffer pool, and after the transaction commits, InnoDB modifies the record (row) in the page. The page in buffer pool is not the same as in disk, and the data in mem is called Dirty data (Dirty page).
2. At the time of each transaction commit, the transaction change operation is logged to redo log immediately. So even if the dirty page in the buffer pool is lost during a power outage, InnoDB will still complete the data recovery based on the records in redo log at startup, and the other effect of redo log is to delay dirty The flush of the page minimizes the random writes of the disk. (Redo log will merge the TRX changes to a page over a period of time)
3, under normal circumstances, dirty page when flush to disk
- Redo log is a ring structure that, when the redo space fills up, will flush part of the dirty page to disk and then release some redo log. This can be observed by innodb_log_wait (SHOW GLOBAL status), where the counter is incremented once
- When you need to assign a page to the buffer pool, but it is full, and all the page is dirty (otherwise you can release the page that is not dirty), it usually does not happen. You must flush dirty pages to disk at this time. This situation will be recorded in the Innodb_buffer_pool_wait_free. In general, you can control this situation with the start parameter innodb_max_dirty_pages_pct, and when the dirty page in buffer pool reaches this scale, it will force a checkpoint to be set and dirty page Flush to disk
- When it detects that the system is idle, it will flush, each time pages
InnoDB configuration parameters involved: Innodb_flush_log_at_trx_commit, innodb_max_dirty_pages_pct; State parameters: Innodb_log_wait, Innodb_buffer_ Pool_wait_free
4. Dirty page since it is in buffer pool, if the system suddenly loses power dirty the data changes in the page will be lost
For example, if a user completes an operation (the database has completed a transaction and the page has been modified in buffer pool, but the dirty page has not been flush), the system loses power and the buffer pool data disappears. Will the user's completed operations (resulting in database modifications) be lost? The answer is no (innodb_flush_log_at_trx_commit=1). That's what redo log is going to do, record updates on disk (data in buffer pool is not permanent)
There are two reasons why a system failure causes database inconsistencies:
- An update to the database that has not completed the transaction may have been written to the database
- The update of the committed transaction to the database may still remain in the buffer before it can be written to the database
Here we first say the general method of recovery:
- Scan the log file (from start to finish) to find the transaction that was committed before the failure occurred (there is a BEGIN transaction and commit record) and record its identity in the Redo (redo) queue. Also identify transactions that were not completed at the time of the failure (only begin TRANSACTION, no commit), and mark their identities in (undo) queue
- Revokes transactions for the undo queue. Undo is handled by reversing the scan log file, performing an inverse operation on each undo transaction, and writing the "Pre-update value" in the log record to the database
- Redo the transactions in the Redo log. The redo is handled by a forward scan log, which performs a log file enlistment operation on each redo transaction. Write "updated values" to the database in the log
Innodb_flush_log_at_trx_commit
The default value of 1 means that every transaction commit or out-of-transaction instruction requires the log to be written to (flush) the hard disk, which is time consuming. Especially when using the battery-powered cache (Battery backed up cache). Set to 2 for many applications, especially from the MyISAM table is possible, it means to write to the system cache instead of writing to the hard disk. The log will still flush to the hard disk every second, so you will not typically lose more than 1-2 seconds of updates. Set to 0 will be faster, but the security side is poor, even if MySQL hangs may also lose the transaction data. A value of 2 will only lose data if the entire operating system is hung.
innodb_max_dirty_pages_pct
He is a integer in the range from 0 to 100. The default value is 90. The main thread in InnoDB tries to write pages from the buffer pool so, the percentage of dirty (not yet written) page s won't exceed this value.
Log Buffer
The log is also cached in memory, which is called log buffer. The log files on the disk are called log file. The log file is generally appended, and can be considered sequential, and the disk IO overhead of sequential writes is less than the random write
ACID
Atomicity (atomicity): All operations in a transaction are either complete or do nothing, not just a partial operation. If an error occurs during execution, roll back (Rollback) to the state before the transaction starts, as if the transaction had never been executed
Transactional Persistence (Durability): Once a transaction is complete, all modifications made to the database by the transaction are persisted to the database. To ensure durability, the database system records the modified data completely into persistent storage
Undo Log
Undo log is an important part of the InnoDB MVCC transaction feature. The undo record is generated when we make a change to the record, and the undo record is logged to the system tablespace (Ibdata) by default, but a separate undo tablespace can be used starting with 5.6
The undo record stores the old version of data, and when an old transaction needs to read the data, in order to be able to read the old version of the data, a record that satisfies its visibility needs to be found along the undo chain. When a version chain is long, it can often be thought of as a time-consuming operation (such as bug#69812).
Most change operations on data include Insert/delete/update, where the insert operation is visible only to the current transaction before the transaction commits, so the resulting undo log can be deleted directly after the transaction commits (who will have visibility requirements for the data just inserted)!! ), and for Update/delete to maintain multiple versions of the information, in InnoDB, the update and delete operations generated by the undo log is classified as a class, that is, Update_undo.
Principle
The principle of Undo log is simple, in order to satisfy the atomicity of a transaction, before any data is manipulated, the data is first backed up to a place where the backup of the storage data is called Undo Log. The data is then modified. If an error occurs or the user executes the ROLLBACK statement, the system can use the backup in the Undo log to restore the data to the state before the transaction began.
In addition to guaranteeing the atomicity of transactions, Undo Log can also be used to assist in the persistence of transactions.
Undo log is for the atomicity of transactions, in the MySQL database InnoDB storage engine, also with the undo log to achieve multi-version concurrency control (abbreviation: MVCC).
Simplified process for atomic and persisted transactions with undo log
Suppose there is a, b two data, the values are 1, 2, respectively.
A. Start the transaction. B. Record a=1 to undo log. C. Modify A=3.D. Record b=2 to undo log. E. Modify B=4.f. Write undo log to disk. G. Write data to disk. H. Transaction submission
Here is an implied precondition: ' data is read into memory first, then modify the data in memory, and finally write the data back to disk ', the reason can be guaranteed both atomicity and persistence, because of the following characteristics:
A. Record the undo log before updating the data.
B. To ensure persistence, the data must be written to disk before the transaction commits. As long as the transaction is successfully committed, the data must be persisted.
C. Undo log must be persisted to disk before data. If the system crashes between g,h, undo log is complete and can be used to roll back the transaction.
D. If the system crashes between a-f, the data is not persisted to disk. So the data on the disk remains in the state before the transaction begins.
Flaw: Writes data and undo log to disk before each transaction commits, which results in a lot of disk IO, so performance is low and if the data can be cached for a period of time, the IO can be reduced to improve performance. However, this will lose the persistence of the transaction.
Redo Log
InnoDB transaction log refers to redo log, which is stored in the log file ib_logfile*
Redo log can be configured as multiple files with parameter Innodb_log_files_in_group, and the other parameter innodb_log_file_size represents the size of each file. So the total Redo log size is Innodb_log_files_in_group * Innodb_log_file_size,redo log file ib_logfile[number] name, log directory can be controlled by parameters innodb_log_group_home_dir .
The Redo log is written sequentially to the file, and when it is full, it goes back to the first file and overwrites the write. (But when doing redo checkpoint, the first log file's header checkpoint tag is also updated, so it's strictly not a sequential write)
The Redo log file is written in a loop, always ensuring that the corresponding dirty page has been brushed to disk before overwriting the write. Under very large loads, the Redo log can produce very fast, resulting in frequent brush dirty operations, which in turn leads to degraded performance, usually after the checkpoint log exceeds 76% of the total file size, InnoDB thinks it may be an unsafe point, Preflush dirty pages will be enforced, causing a large number of user threads to live stall. If this scenario is expected, we recommend resizing the redo log file. You can do a clean shutdown and then modify the redo log configuration to restart the instance.
Principle
In contrast to undo log, Redo log is a backup of the new data. You do not need to persist the data until the transaction commits, as long as the redo log is persisted. When the system crashes, the redo log is persisted, although the data is not persisted. The system can restore all data to the latest state based on the contents of the Redo log.
The simplified process of Undo + redo transactions
Suppose there is a, b two data, the values are each.
A. Start the transaction. B. Record a=1 to undo log. C. Modify A=3.D. Log a=3 to redo log. E. Record b=2 to undo log. F. Modify B=4.G. Log b=4 to redo log. H. Write redo log to disk. I. Transaction Submission
Features of Undo + Redo Transaction
A. To ensure persistence, the redo log must be persisted before the transaction is committed.
B. The data does not need to be written to the disk before the transaction commits, but is cached in memory.
C. Redo Log guarantees the persistence of the transaction.
D. Undo Log guarantees the atomicity of the transaction.
E. There is an implied characteristic that data must be written to persistent storage later than redo log.
IO performance
The design of Undo + redo is primarily concerned with improving IO performance. Although the data is cached, the IO is reduced, but a new IO is introduced, that is, the IO that writes redo log. If the IO performance of redo log is not good, it can not be the purpose of lifting high performance. In order to ensure Redo log can have better IO performance, InnoDB's Redo log design has the following features:
A. Try to keep the redo log stored in a contiguous space. Therefore, the space of the log file is fully allocated when the system is first started. Record redo log in sequential append mode to improve performance through sequential IO.
B. Bulk write logs. Instead of writing directly to the file, the log writes redo log buffer. When you need to flush logs to disk (such as transaction commits), write many logs together to disk.
C. Concurrent transaction sharing redo log storage space, their redo log in the order of execution of the statement, sequentially alternating records together to reduce the space occupied by the log. For example, the record in Redo log might be something like this:
Record 1: <trx1, insert ...> record 2: <trx2, update ...> record 3: <trx1, delete ...> record 4: <trx3, update ...> record 5: <trx2, Insert ...>
D. Because of C, when a transaction writes redo log to disk, the log of other uncommitted transactions is also written to disk.
E. Redo logs are only sequentially appended, and when a transaction needs to be rolled back, its Redo log record is not removed from the Redo log.
Recovery (Recovery)It is very necessary to see log in recovery: The data can be recovered when the database has an exception. For exceptions that do not damage the disk drive, recovery is automatic. InnoDB reads the latest checkpoint log records, checks if dirty pages is written to disk before the exception occurs, and if not, reads the log records that affect the page and applies them. This is called "rolling forward". Because there are LSN, InnoDB only need to compare this number to be able to synchronize.
Recovery strategy
The redo log is also logged for uncommitted transactions and rollback transactions, so these transactions are handled in a special process when recovering. There are 2 different recovery strategies:
A. When recovering, only the transactions that have already been committed are re-made.
B. When recovering, redo all transactions include uncommitted transactions and rolled back transactions. The uncommitted transactions are then rolled back through the undo log.
When you use Update, INSERT, DELETE statement, you change the data in two places: Log buffer and buffers. Buffers is a fixed-length block of memory, typically 512 bytes. LOG Buffer DATA Buffer
================= ===============
= Log Record #1 = = Page Header =
= Log Record #2 = = Data Row =
= Log Record #3 = = Data Row =
= Log Record #4 = = Data Row =
================= ===============
For example, after the INSERT into JOBS VALUES statement executes, log buffer adds a new log record, called the log record #5, which contains the contents of a rowid and a new record. At the same time, data buffer will add a new row, but it will also be identified on the page header: The latest log record for the page is the #5 of the log record. In this example, the # # is the log Sequence number (LSN), which is critical for the timing of the next operation. Here are some details of Data-change: 1. An insert log record contains only one new data, which is sufficient for the redo operation on the page and is therefore referred to as a redo entry. 2. LSN is not a field of log records, it is a relative offset value of an absolute address in a file. After InnoDB changed the log buffer and data buffer, the next step was to write the disk. This is a complex place. There are three scenarios in which multiple threads monitor the activity of buffer, ――overflow, checkpoint, and Commit―― can lead to write operations.
what happened in the overflows case?
Overflow is a rare occurrence, as InnoDB uses pro-active measures to prevent the buffers from being filled. But let's take a look at the following two scenarios:
1. If log buffer is full, InnoDB writes log at the end of buffer. So the situation is the same as the following figure (log buffer has only four records of space, now insert fifth record): Log FILE (S) before WRITING log RECORD #5
=================
= Log Record #1 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
=================
Log FILE (S) after WRITING LOG RECORD #5
=================
= Log Record #5 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
=================
Logs cannot grow forever. Even if InnoDB uses some compression algorithms, the log file is too large to fit on any disk drive. So InnoDB takes a circular write, which means that it will overwrite the previous log record.
2. If data buffer is full, InnoDB writes the most recently used buffer to the database, but it cannot be fast enough. In this case, the LSN of the header works, and InnoDB checks to see if its LSN is larger than the LSN of the nearest log record in the log file, and writes the data to disk only when log catches up with data. In other words, the data page does not write to the disk until the corresponding log record needs to be written. This is the Write log policy first.
what happened when checkpoints.
Previously said InnoDB took some pro-active measures to ensure that overflows not occur, the most important measure is checkpointing. There is a separate thread, or a thread that is detached from a set of threads that modify buffers. At a specific time interval, Checkpointer will wake up, check for buffer changes, and ensure that the write operation has occurred.
Most DBMS at this time will have all the buffer written, so that all the changes but not the disk buffer is written. This means that the DBMS will flush all "dirty" buffers through "sharp Checkpoint". But InnoDB only guarantees:
A, log, and data buffers will not exceed a certain limit point;
b, log is always more than data write disk;
C, the header LSN of no data buffer equals the log record that is overwritten. That is to say InnoDB is "Fuzzy Checkpoint".
at commit time, InnoDB will not dirty data page to write to the disk. The reason for this is that it's easy to think that committing a change is about writing everything to a persistent medium. In fact, only log records need to be written. Writing Dirty data page can only occur at overflow or checkpoint moments, because their contents are superfluous. Reference articles
http://www.orczhou.com/index.php/2009/08/innodb-dirty-page-redo-log-2/
Http://mysqldump.azundris.com/archives/78-Configuring-InnoDB-An-InnoDB-tutorial.html
Http://dev.mysql.com/doc/refman/5.0/en/innodb.html
http://mysql.taobao.org/monthly/2015/06/01/
http://mysql.taobao.org/monthly/2015/04/01/
http://mysql.taobao.org/monthly/2015/05/01/
"mSQL" Redo and undo log