[Msql] redo and undo log, msqlredoundolog

Source: Internet
Author: User

[Msql] redo and undo log, msqlredoundolog

InnoDB has two very important logs: undo log and redo log. The former is used to ensure the atomicity of transactions and the MVCC of InnoDB, and the latter is used to ensure transaction persistence. Like most relational databases, InnoDB records physical changes to data files and ensures that logs are always logged first, that is, the so-called WAL (Write Ahead Log), that is, before a persistent data file, ensure that the previous redo logs have been written to the disk

I. Concept 1. Innodb Crash Recovery

This is a feature of the InnoDB engine. When a fault occurs and the service is restarted, the recovery operation is automatically completed to restore the database to a normal state (no logs need to be redone, you only need to execute the log after the last fl point. This point is called Checkpoint. There are two steps in the restoration process.

Step 1: Check the redo log and redo all the previously completed and committed transactions;

Step 2: cancel all unfinished transactions in the undo log

2. LSN

The LSN (log sequence number) is used to record the log sequence number. It is an increasing unsigned long integer.

In the log system of InnoDB, LSN is everywhere. It is used to indicate the log sequence number when a dirty page is modified, and also to record the checkpoint, you can locate the location in the redo log file.

The LSN is expressed by the byte offset. Each page has an LSN, redo log also has an LSN, and Checkpoint also has an LSN. You can run the commandshow engine innodb statusTo observe:

---LOG---Log sequence number 33646077360Log flushed up to   33646077360Last checkpoint at  336460773600 pending log writes, 0 pending chkp writes49687445 log i/o's done, 1.25 log i/o's/second

To manage dirty pages, each instance in the Buffer Pool maintains a flush list. pages on the flush list are sorted by the LSN of the modified pages. Therefore, when redo checkpoint is regularly performed, the selected LSN is always the oldest page (with the smallest LSN) on the flush list of all bp instances ). Because the WAL policy is adopted, each transaction commit requires a persistent redo log to ensure that the transaction is not lost. The delayed dirty page refreshing results in merging multiple modifications to avoid performance problems caused by frequent data file writing.

3. Dirty page

In InnoDB, data written into bp can accelerate data processing and cause data inconsistency (RAM vs DISK)

1. In every request to a user that causes data changes, the Innodb engine loads the data and indexes into the buffer pool in the memory, if you need to update the data and index to the disk each time you modify the data, the I/O request will be greatly increased.

Because the position of each update is random, the head needs to be located frequently, resulting in low efficiency. The data is temporarily stored in the memory, which also increases the read speed to a certain extent. Therefore, Innodb adds only one log after processing one request (Transaction), and another thread is responsible for Intelligently reading log files and batch updating them to the disk, the most efficient disk write. Since innodb uses Mem buffer to increase the corresponding speed, it will also bring about data inconsistency. The term is dirty data, which mysql calls dirty page.

Occurrence process: when a Transaction (Transaction) needs to modify a record (row), InnoDB needs to read the page of the data from disk to the buffer pool. After the Transaction is committed, innoDB modifies the record (row) in the page ). In this case, the page in the buffer pool is different from that in the disk. The data in mem is called the dirty data (dirty page ).

2. When a transaction is committed, the transaction change operation is immediately recorded in the redo log. Therefore, even if the dirty page in the buffer pool is lost during power failure, InnoDB will recover data based on the redo log records at startup. Another function of redo log is, minimize the random writes of the disk by delaying the flush of the dirty page. (Redo log merges the TRX changes to a page within a period of time)

3. When will the dirty page be flushed to the disk normally?

  • Redo log is a ring structure. When the redo space is full, some dirty pages are flushed to the disk, and some redo logs are released. In this case, you can use Innodb_log_wait (show global status) to observe the situation. In this case, the counter is automatically increased once.
  • When you need to allocate a page in the Buffer pool, but it is full, and all the pages are dirty (otherwise, the page without dirty can be released), this usually does not happen. In this case, flush dirty pages to disk is required. This will be recorded in Innodb_buffer_pool_wait_free. Generally, the innodb_max_dirty_pages_pct parameter can be enabled to control this situation. When the dirty page in the buffer pool reaches this ratio, a checkpoint is forcibly set and the dirty page is flushed to the disk.
  • When the system is idle, it will be flushed, 64 pages each time

Involved InnoDB configuration parameters: innodb_flush_log_at_trx_commit, innodb_max_dirty_pages_pct; status parameters: Innodb_log_wait, Innodb_buffer_pool_wait_free

4. Since dirty page is in the Buffer pool, will the data modification in Dirty page be lost if the system suddenly loses power?

For example, if a user completes an operation (the database completes a transaction, the page has been modified in the buffer pool, but the dirty page has not been flushed), the system loses power and all the buffer pool data disappears. Will the Operations completed by this user (resulting in database modifications) be lost? The answer is no (innodb_flush_log_at_trx_commit = 1 ). This is what redo log should do. Record updates on disk (the data in the buffer pool is not permanent)

There are two reasons for Database Inconsistency caused by system faults:

  • Uncompleted transactions may have written updates to the database.
  • The database updates of committed transactions may remain in the buffer before being written to the database.

Here we will talk about the general restoration method:

  • Scan the log file (from start to end), locate the transactions that have been committed before the fault occurs (there are begin transaction and commit records), and redo the records. At the same time, locate the unfinished transactions (only begin transaction, no commit) at the time of the fault, and record the identifiers to the (undo) queue.
  • Undo all transactions in the undo queue. The undo processing method is to reverse scan the log file and perform a reverse operation on the update operations of each undo transaction to write the "pre-update value" in the log records to the database.
  • Redo the transactions in the redo log. The redo processing method is to scan logs forward and re-execute log file registration for each redo transaction. Write the "updated value" in the log to the database

Innodb_flush_log_at_trx_commit 

The default value 1 indicates that logs need to be written to the (flush) Hard Disk for each transaction commit or non-transactional command, which is very time-consuming. Especially when Battery backed up cache is used. Set to 2 is applicable to many applications, especially the conversion from the MyISAM table. It means writing data to the system cache instead of writing data to the hard disk. Logs are flushed to the hard disk every second. Therefore, updates that exceed 1-2 seconds are generally not lost. Setting 0 is faster, but the security is poor. Even if MySQL fails, the transaction data may be lost. Value 2 can only lose data when the entire operating system is down.

Innodb_max_dirty_pages_pct

His is an integer in the range from 0 to 100. the default value is 90. the main thread in InnoDB tries to write pages from the buffer pool so that the percentage of dirty (not yet written) pages will not exceed this value.

4. ACID

Atomicity: all operations in a transaction are completed or not performed. If an error occurs during execution, roll back to the state before the start of the transaction, just as this transaction has never been executed

Durability of a transaction: Once the transaction is completed, all modifications made to the database by the transaction will be permanently saved to the database. To ensure durability, the database system records the modified data to persistent storage.

5. Log buffer

Logs are also cached in the memory. Here they are called log buffer. The log file on the disk is called a log file. Log files are usually appended content and can be considered sequential write. The disk IO overhead of sequential write is smaller than that of random write.

When the data block in the buffer cache is modified, the server process generates redo data and writes it to the redo log buffer. When the following conditions are met, LGWR will write the entries in the redo log buffer into the online redo log:

  • Redo log buffer is 1/3 full
  • Timeout every 3 seconds (Timeout
  • The data in log_buffer reaches 1 MB.
  • When the transaction is committed

If our system has a fast processor and a disk with a relatively low I/O speed, the processor may fill up the rest of the cache space, which will prompt LGWR to move part of the data in the buffer zone to the disk. In this case, a large log buffer can temporarily mask the impact of slow disks on the system. You can make the following options:

  • Improve the checkpoint or archiving process
  • Improve LGWR performance (maybe you can store all online redo logs to faster bare devices)

Rational use of redo log buffer

  • Batch commit is used for batch operations, so that LGWR can write redo entries to online redo log files more efficiently.
  • Nologging is used when a large amount of data is loaded.

Set Log Buffer

Redo log buffer is determined by the initialization parameter LOG_BUFFER. to modify this parameter, restart the instance. The appropriate redo log buffer parameter value can significantly improve the system throughput, especially for systems with large data volumes such as insertion, update, and deletion. Default log buffer value: MAX (0.5 M, (128 K * number of cpus). Generally, the default value is sufficient. Increasing the value of log buffer does not negatively affect system performance or recoverability, but only uses additional memory.

Generally, log_buffer is enough at 3-5 MB. Exceeding 3-5 M is just a waste of memory. Of course, it is too small and may affect performance. Today, when the memory is not very expensive, and if you have a large number of "large transactions", set log_buffer to 5 MB.

Ii. Undo log

Undo log is an important part of InnoDB MVCC transaction features. When we change the record, an undo record is generated. The Undo record is recorded in the system tablespace (ibdata) by default. However, an independent Undo tablespace can also be used from 5.6.

The old version of data is stored in the Undo record. When an old transaction needs to read data, in order to read the data of the old version, it is necessary to follow the undo chain to find a record that satisfies its visibility. When the version chain is very long, it is generally considered a time-consuming operation (for example, bug #69812 ).

Most data change operations include INSERT, DELETE, and UPDATE. The INSERT operation is only visible to the current transaction before the transaction is committed, therefore, the Undo logs generated can be directly deleted after the transaction is committed (who needs to have visibility into the data just inserted !!), For UPDATE/DELETE operations, you need to maintain multi-version information. In InnoDB, The Undo logs generated by the UPDATE and DELETE operations are classified into one type, namely, update_undo.

Principle

The principle of Undo Log is very simple. In order to satisfy the atomicity of the transaction, data is first backed up to one place before any data is operated (this storage data backup is called Undo Log ). Then modify the data. If an error occurs or the user executes the ROLLBACK statement, the system can use the backup in the Undo Log to restore the data to the status before the start of the transaction.

In addition to ensuring the atomicity of transactions, Undo logs can also be used to assist in transaction persistence.

The Undo Log is used to implement the atomicity of transactions. In the InnoDB Storage engine of MySQL database, the Undo Log is also used to implement multi-version concurrency control (MVCC ).

Simplify the process of atomic and persistent transactions with Undo Log

Assume that there are two data types, A and B. The values are respectively 1 and 2.

A. the transaction starts. b. record A = 1 to undo log. c. modify A = 3. d. record B = 2 to undo log. e. modify B = 4. f. write the undo log to the disk. G. Write Data to the disk. H. transaction commit

Here is an implicit precondition: 'The data is read to the memory first, then the data in the memory is modified, and finally the data is written back to disk', the reason why it can ensure both atomicity and persistence, it has the following features:

A. Record the Undo log before updating the data.
B. To ensure durability, data must be written to the disk before the transaction is committed. Data must be persistent as long as the transaction is successfully committed.
C. The Undo log must be persistent to the disk prior to data. If the system crashes between G and H, the undo log is complete and you can roll back and forth the transaction.
D. If the system crashes between A-F because the data is not persisted to the disk. Therefore, the data on the disk remains in the State before the transaction starts.

Defect: before each transaction is committed, data and Undo Log are written to the disk, which leads to a large number of disk IO, so the performance is very low. If the data can be cached for a period of time, i/O can be reduced to improve performance. However, the transaction persistence will be lost.

Iii. Redo Log

The transaction log of Innodb refers to the Redo log, which is stored in the log file ib_logfile *.

Redo log can be configured to multiple files through the innodb_log_files_in_group parameter. The other parameter innodb_log_file_size indicates the size of each file. Therefore, the total redo log size is innodb_log_files_in_group * innodb_log_file_size.ib_logfile[number]The log directory can be named by Parametersinnodb_log_group_home_dirControl.

Redo logs are written to the file in sequence. When the file is full, it is traced back to the first file for overwrite. (However, when performing redo checkpoint, it will also update the header checkpoint mark of the first log file, so strictly speaking, it is not sequential writing)

Redo log files are written cyclically. Before overwriting, ensure that the corresponding dirty pages have been flushed to the disk. Under a very large load, Redo logs may generate very fast speeds, resulting in frequent dirty operations, leading to performance degradation, generally, after the log with no checkpoint exceeds 76% of the total file size, InnoDB deems this may be an insecure point and will forcibly preflush dirty pages, resulting in a large number of user threads stall. If this is expected, we recommend that you increase the size of the redo log file. You can perform a clean shutdown, modify the Redo log configuration, and restart the instance.

Principle

In contrast to the Undo Log, the Redo Log records the backup of new data. Before a transaction is committed, you only need to persist the Redo Log and do not need to persist the data. When the system crashes, the Redo Log is persistent even though the data is not persistent. The system can recover all data to the latest status based on the Redo Log Content.

Undo + Redo transaction simplification process

Assume that there are two data types, A and B. The values are respectively 1 and 2.

A. the transaction starts. b. record A = 1 to undo log. c. modify A = 3. d. record A = 3 to redo log. e. record B = 2 to undo log. f. modify B = 4. g. record B = 4 to redo log. h. write the redo log to the disk. I. transaction commit
Features of Undo + Redo transactions

A. To ensure durability, the Redo Log must be persistent before the transaction is committed.
B. Data is cached in the memory instead of being written to the disk before the transaction is committed.
C. Redo Log ensures transaction persistence.
D. Undo Log ensures the atomicity of transactions.
E. There is an implicit feature that data must be written later than redo log to persistent storage.

Iv. IO Performance

The Undo + Redo Design mainly considers improving IO performance. Although data is cached to reduce the I/O of Data Writing, new I/O is introduced, that is, the I/O of Redo Log writing. If the IO performance of the Redo Log is poor, it cannot improve the performance. To ensure good IO performance of Redo logs, the Redo Log Design of InnoDB has the following features:

A. Try to keep the Redo Log stored in A continuous space. Therefore, when the system starts for the first time, the space of the log file is fully allocated. Redo logs are recorded in sequential append mode, and performance is improved through sequential IO.
B. Write logs in batches. Logs are not directly written to files, but are first written to the redo log buffer. When you need to refresh the logs to the disk (such as transaction commit), many logs are written to the disk together.
C. Concurrent transactions share the storage space of Redo logs. Their Redo logs are recorded in the order of statement execution in turn to reduce the space occupied by logs. For example, the record content in the Redo Log may be as follows:

Record 1: <trx1, insert...> Record 2: <trx2, update...> Record 3: <trx1, delete…> Record 4: <trx3, update...> Record 5: <trx2, insert...>

D. Because of C, when a transaction writes the Redo Log to the disk, it also writes the logs of other uncommitted transactions to the disk.
E. Only sequential append operations are performed on the Redo Log. When a transaction needs to be rolled back, its Redo Log records will not be deleted from the Redo Log.

5. Recovery)In recovery, we can see that log is very necessary: When a database exception occurs, data can be restored. Recovery is automatically performed for exceptions that do not damage the disk drive. InnoDB reads the latest checkpoint log records and checks whether dirty pages are written to the disk before an exception occurs. If not, it reads the log records that affect the page and applies them. This is called "rolling forward ". Because there is LSN, InnoDB only needs to compare this number for synchronization. Recovery Policy

As mentioned above, uncommitted transactions and rolled back transactions also record the Redo Log. Therefore, special processing is required for these transactions during recovery. There are two different recovery policies:

A. Only committed transactions are redone during restoration.
B. During recovery, redo all the transactions, including uncommitted transactions and rolled back transactions. Then, roll back the uncommitted transactions through the Undo Log.

When you use the UPDATE, INSERT, and DELETE statements to UPDATE data, you change the data in two places: log buffer and data buffers. Buffers is a memory block with a fixed length, usually 512 bytes. LOG BUFFER DATA BUFFER
======================================
= Log Record #1 = Page Header =
= Log Record #2 = Data Row =
= Log Record #3 = Data Row =
= Log Record #4 = Data Row =
======================================
For example, after the insert into jobs values (, 3) Statement is executed, log buffer adds a new log Record called Log Record #5, which contains a rowid and the content of the new Record. At the same time, data buffer will also add a new line, but it will also identify in the page header: The latest log Record on this page is Log Record #5. In this example, #5 is the Log Sequence Number (LSN), which is crucial for the Sequence arrangement of subsequent operations. The following are some details about data-change: 1. An INSERT log record contains only one new data, which is sufficient for performing re-operations on the page. Therefore, it is called a redo entry. 2. The LSN is not a log record field. It is the relative offset of an absolute address in the file. After InnoDB changes log buffer and data buffer, the next step is to write the disk. This is complicated. Multiple Threads monitor the activity of the buffer. Three conditions are available: overflow, checkpoint, and commit, which can lead to disk write operations. What happened under Overflows?

Overflow is rare because InnoDB uses pro-active measures to prevent buffers from being filled up. But let's take a look at the following two situations:

1. If the log buffer is full, InnoDB writes the log at the end of the buffer. In this case, the following figure is the same (log buffer only has four records, and now the fifth RECORD is inserted): log file (S) before writing log record #5
========================
= Log Record #1 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
========================

Log file (S) after writing log record #5
========================
= Log Record #5 =
= Log Record #2 =
= Log Record #3 =
= Log Record #4 =
========================
Logs cannot grow forever. Even if InnoDB uses some compression algorithms, log files are too large to be stored on any disk drive. Therefore, InnoDB adopts the circular writing method, that is, it will overwrite the previous log records.

2. If the data buffer is full, InnoDB writes the recently used buffer to the database, but it cannot be fast enough. In this case, the LSN of the page header takes effect. InnoDB checks whether the LSN of the page header is larger than the LSN of the recent log record in the log file. Only when the log catch up with the data, to write data to the disk. In other words, data pages do not write disks until the corresponding log records need to be written to the disk. This is the log writing policy.

What happened at CheckPoint?

How many pages are refreshed by the Checkpoint mechanism, where are the dirty pages retrieved, and when are the updates triggered? These are complex. There are two types of checkpoints: Sharp Checkpoint and Fuzzy Checkpoint.

Sharp Checkpoint fl all dirty pages back to disk when the database is closed

Fuzzy Checkpoint is used to refresh some dirty pages during running. Some dirty pages are refreshed in the following ways:

A. Master Thread Checkpoint

The Master Thread Refreshes a certain percentage of pages back to the disk from the dirty page list of the buffer pool every second or every 10 seconds. This process is asynchronous and does not block the query thread.

B. FLUSH_LRU_LIST Checkpoint

InnoDB must ensure that there are about 100 idle pages in the LRU list. Before InnoDB1.1.X, check whether there are enough pages in LRU for the user to query the operation thread. If not, the pages at the end of the LRU list will be eliminated, if there are dirty pages in the obsolete pages, the Checkpoint will be forcibly executed to refresh the dirty pages to the disk. This will obviously block the user query thread. From InnoDB1.2.X, this check is placed in a separate Page Cleaner Thread.innodb_lru_scan_depthControls the number of available pages in the LRU list. The default value is 1024.

C. Async/Sync Flush Checkpoint

When redo log files are unavailable, you must forcibly refresh some pages in the dirty page list back to the disk. This ensures that redo log files can be used cyclically. Before InnoDB1.2.X, Async Flush Checkpoint will block the user query thread that finds the problem, and Sync Flush Checkpoint will block all query threads. After InnoDB1.2.X is placed in a separate Page Cleaner Thread.

D. Dirty Page too much Checkpoint

When the number of dirty pages is too large, the InnoDB Engine enforces Checkpoint. The purpose is to ensure that there are enough free pages available in the buffer pool. It can be passed through Parametersinnodb_max_dirty_pages_pctTo set:

mysql> show variables like 'innodb_max_dirty_pages_pct';+----------------------------+-------+| Variable_name              | Value |+----------------------------+-------+| innodb_max_dirty_pages_pct | 90    |+----------------------------+-------+

As mentioned above, InnoDB has adopted some pro-active measures to ensure that no overflows occur. The most important measure is checkpointing. There is a separate thread, or a thread that is separated from a group of threads that modify buffers. At a specific interval, the checkpointer will wake up, check the buffer changes, and ensure that the write disk operation has occurred.

Most DBMS will write all the buffer into the disk at this time, which ensures that all the buffers that have changed but have not been written to the disk are written to the disk. That is to say, the DBMS will flush all "dirty" buffers through "Sharp Checkpoint. However, InnoDB only guarantees that:

A, log, and data buffers will not exceed a certain limit point;

B. logs are always written to disks before data is written;

C. The page header LSN of no data buffer is equal to the log record to be overwritten. That is to say, InnoDB is "Fuzzy Checkpoint ".

During COMMIT, InnoDB does not write dirty data page to the disk. This is because it is easy to think that submitting a change is to write everything to a persistent media. In fact, only log records need to be written. Writing dirty data pages can only happen at the time of overflow or checkpoint, because their content is redundant.References

Http://www.orczhou.com/index.php/2009/08/innodb-dirty-page-redo-log-2/
Http://mysqldump.azundris.com/archives/78-Configuring-InnoDB-An-InnoDB-tutorial.html
Http://dev.mysql.com/doc/refman/5.0/en/innodb.html
Http://mysql.taobao.org/monthly/2015/06/01/
Http://mysql.taobao.org/monthly/2015/04/01/
Http://mysql.taobao.org/monthly/2015/05/01/
Http://my.oschina.net/jockchou/blog/478162
Http://www.mysqlops.com/2011/12/20/understanding_index2.html

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.