MySQL & #183; Engine Features & #183; InnoDB redo log roaming

Last Update:2016-03-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL Engine Features-InnoDB redo log roaming
Preface

InnoDB has two very important logs: undo log and redo log. The former is used to ensure the atomicity of transactions and the MVCC of InnoDB, and the latter is used to ensure transaction persistence.

Like most relational databases, InnoDB records physical changes to data files and ensures that logs are always logged first, that is, the so-called WAL, before the persistent data files, ensure that the previous redo logs have been written to the disk.

The LSN (log sequence number) is used to record the log sequence number. It is an increasing unsigned long integer. In the log system of InnoDB, LSN is everywhere. It is used to indicate the log sequence number when a dirty page is modified, and also to record the checkpoint, you can locate the location in the redo log file.

To manage dirty pages, each instance in the Buffer Pool maintains a flush list. pages on the flush list are sorted by the LSN of the modified pages. Therefore, when redo checkpoint is regularly performed, the selected LSN is always the oldest page (with the smallest LSN) on the flush list of all bp instances ). Because the WAL policy is adopted, each transaction commit requires a persistent redo log to ensure that the transaction is not lost. The delayed dirty page refreshing results in merging multiple modifications to avoid performance problems caused by frequent data file writing.

Because the features of the InnoDB log group have been discarded (multiple copies of redo logs are written), the InnoDB archive log feature is also completely removed in 5.7, this document ignores the related logic. In addition, due to space limitations, the InnoDB crash recovery logic will be described in the next phase. This article focuses on the lifecycle of redo logs and some improvements to MySQL 5.7.

This article is based on the latest MySQL 5.7.7-RC version.

MySQL InnoDB table-Basic BTree Data Structure

Optimization of the count (*) function in the InnoDB Storage engine of MySQL

MySQL InnoDB Storage engine lock mechanism Experiment

Startup, shutdown, and restoration of the InnoDB Storage Engine

MySQL InnoDB independent tablespace Configuration

Architecture of MySQL Server layer and InnoDB Engine Layer

InnoDB deadlock Case Analysis

MySQL Innodb independent tablespace Configuration

InnoDB log files

The redo log of InnoDB can use parametersinnodb_log_files_in_groupConfigure multiple files with another parameterinnodb_log_file_sizeThe size of each file. Therefore, the total redo log size isinnodb_log_files_in_group * innodb_log_file_size.

The Redo log file usesib_logfile[number]The log directory can be named by Parametersinnodb_log_group_home_dirControl. Redo logs are written to the file in sequence. When the file is full, it is traced back to the first file for overwrite. (However, when performing redo checkpoint, it will also update the header checkpoint mark of the first log file, so strictly speaking, it is not sequential writing ).

Inside InnoDB, logicallyib_logfileIt is treated as a file and corresponds to the same space id. Because a 512-byte block is used to write data to a file, you can conveniently calculate the file to be written and the corresponding offset based on the globally maintained LSN.

Redo log files are written cyclically. Before overwriting, ensure that the corresponding dirty pages have been flushed to the disk. Under a very large load, Redo logs may generate very fast speeds, resulting in frequent dirty operations, leading to performance degradation, generally, after the log with no checkpoint exceeds 76% of the total file size, InnoDB deems this may be an insecure point and will forcibly preflush dirty pages, resulting in a large number of user threads stall. If this is expected, we recommend that you increase the size of the redo log file. You can perform a clean shutdown, modify the Redo log configuration, and restart the instance.

In addition to the redo log file, InnoDB also has other log files, such as intermediate log files generated to ensure the truncate operation, including the truncate innodb table and truncate undo log tablespace, an intermediate file is generated to identify whether the operations are successful or failed. If the truncate is not completed, redo the operations during crash recovery. Interestingly, according to the official worklog description, the truncate operation was initially atomic by adding a new redo log type, but I don't know why I changed it to the log file method later. Maybe it's because of the compatibility of earlier versions.

Key struct log_sys object

log_sysIt is the hub and core object of the InnoDB log system and controls core functions such as log copying, writing, and checkpoint. It is also a hotspot module for large write load scenarios. It is the hub connecting InnoDB log files and log buffer, and the corresponding struct islog_t.

The member variables related to the redo log file include:

Variable name	Description
Log_groups	Log group. The current version only supports one set of logs. The corresponding type is`log_group_t`Contains the number of files in the current log group, the size of each file, and the space id.
Lsn_t log_group_capacity	Indicates the total capacity of the current log file. The value is: (total size of the Redo log file-Number of redo files * LOG_FILE_HDR_SIZE) * 0.9, and LOG_FILE_HDR_SIZE is 4*512 bytes.
Lsn_t max_modified_age_async	Asynchronous preflush dirty page point
Lsn_t max_modified_age_sync	Synchronize preflush dirty page points
Lsn_t max_checkpoint_age_async	Asynchronous checkpoint
Lsn_t max_checkpoint_age	Synchronize checkpoints

For the calculation methods of the above several sync/async points, refer to the Functionlog_calc_max_agesTake the following instance configuration as an example:

Innodb_log_files_in_group = 4innodb_log_file_size = 4g Total file size: 17179869184

Variable values of each member and the proportion to the total file size:

log_sys->log_group_capacity = 15461874893 (90%)log_sys->max_modified_age_async = 12175607164 (71%)log_sys->max_modified_age_sync = 13045293390 (76%)log_sys->max_checkpoint_age_async = 13480136503 (78%)log_sys->max_checkpoint_age = 13914979615 (81%)

Generally:

The distance between the oldest lsn before Dangdang and the current lsn is greatermax_modified_age_async(71%), and the option is enabledinnodb_adaptive_flushingThe page cleaner thread will try to do more dirty page flush work to avoid dirty page accumulation.
The distance between the oldest lsn before Dangdang and the current Lsn is greatermax_modified_age_sync(76%), the user thread needs to do synchronous dirty brush, this is a critical point of performance decline, will greatly affect the overall throughput and response time.
When the lsn of the last checkpoint and the current lsn exceedmax_checkpoint_age(81%). The user thread needs to perform a checkpoint synchronously and wait until the checkpoint write is completed.
When the distance between the lsn of the last checkpoint and the current lsn exceedsmax_checkpoint_age_async(78%) but lessmax_checkpoint_age(81%), the user thread performs an asynchronous checkpoint (the background asynchronous thread executes the CHECKPOINT information write operation), without waiting for the checkpoint to complete.

log_group_tThe main structure members are shown in the following table:

Variable name	Description
Ulint n_files	Number of Ib_logfile files
Lsn_t file_size	File Size
Ulint space_id	Space id of the Redo log, fixed size, value: SRV_LOG_SPACE_FIRST_ID
Ulint state	LOG_GROUP_ OK or log_group_upted
Lsn_t lsn	Lsn written in the group
Lsn_t lsn_offset	File offset corresponding to the above lsn
Byte ** file_header_bufs	Buffer area, used to set the log file header information and write it to ib logfile. When you switch to a new ib_logfile, update the starting lsn of the file and write it to the header. The header information also contains: LOG_GROUP_ID, LOG_FILE_START_LSN (Starting lsn of the current file), LOG_FILE_WAS_CREATED_BY_HOT_BACKUP (function: log_group_file_header_flush)
Lsn_t scanned_lsn	Used to assist in recording the scanned lsn during crash recovery
Byte * checkpoint_buf	Checkpoint buffer, used to write checkpoint information to log files (detailed description below)

The member variables related to the redo log memory buffer include:

Variable name	Description
Ulint buf_free	Currently idle writable location in Log buffer
Byte * buf	Log buffer start position pointer
Ulint buf_size	The Log buffer size, which is controlled by the innodb_log_buffer_size parameter, but may be automatically extend.
Ulint max_buf_free	The value is log_sys-> buf_size/LOG_BUF_FLUSH_RATIO-LOG_BUF_FLUSH_MARGIN, of which: Limit = 2, LOG_BUF_FLUSH_MARGIN = (4512 + 4 page_size), and page_size is 16 k by default, when buf_free exceeds this value, the user thread may be triggered to write redo. After the transaction copies the redo to the buffer, the user thread will also judge this value. If the value exceeds buf_free, set log_sys-> check_flush_or_checkpoint to true.
Ulint buf_next_to_write	Log buffer offset. the starting position of the next redo file write, that is, the end position of the current write.
Volatile bool is_extending	Whether the Log buffer is being extended (to prevent too many redo log entries from being written to the buffer). In fact, when the length of the written redo log exceeds buf_size/2, it will call the log_buffer_extend function, once the Buffer is extended, it will not be reduced back!
Ulint write_end_offset	Offset of the end position of the write operation (logically redundant, use log_sys-> buf_free)

Member variables related to the Checkpoint:

Variable name	Description
Ib_uint64_t next_checkpoint_no	This value is incremented every time the checkpoint is completed.
Lsn_t last_checkpoint_lsn	The lsn of the last checkpoint. Each checkpoint is completed, the value of next_checkpoint_lsn is assigned to last_checkpoint_lsn.
Lsn_t next_checkpoint_lsn	The lsn of the next checkpoint (the lsn of the checkpoint initiated this time)
Mtr_buf_t * append_on_checkpoint	5.7. When DDL is performed (for example, adding or deleting columns), The buf containing the MLOG_FILE_RENAME2 log is first mounted to this variable. After the DDL operation is complete, clear it again. (Log_append_on_checkpoint), mainly to prevent data dictionary inconsistency generated by crash during DDL. This variable is added to the following commit: a5ecc38f44abb66aa2024c70e37d1f4aa4c8ace9
Ulint n_pending_checkpoint_writes	If the value is greater than 0, a checkpoint write operation is in progress. This value is incremented when the user initiates a checkpoint. This value (log_io_complete) is decreased after the background thread completes checkpoint writing)
Rw_lock_t checkpoint_lock	Checkpoint lock. The x lock is required each time the checkpoint information is written. The x lock is released by an asynchronous io thread.
Byte * checkpoint_buf	Checkpoint information buffer. Before each checkpoint, write the buf and then fl the buf to the disk.

Other status variables

Variable name	Description
Bool check_flush_or_checkpoint	When this variable is set, the user thread may need to check and release log buffer, or perform preflush and checkpoint to prevent insufficient Redo space.
Lsn_t write_lsn	The last LSN written to the file
Lsn_t current_flush_lsn	LSN to which fsync is currently in progress
Lsn_t flushed_to_disk_lsn	Last time fsync was completed to the LSN of the file
Ulint n_pending_flushes	Indicates the redo fsync of pending. The maximum value is 1.
OS _event_t flush_event	If fsync is in progress and the request is also fsync, you need to wait until the last fsync operation is completed.

The relationship between log_sys and log files and the log buffer can be expressed as follows:

Mini transaction

Mini transaction (mtr) is the smallest transaction Unit for InnoDB to operate physical data files. It is used to manage Page locking, modification, release, and commit logs to the public buffer. An mtr operation must be atomic. A transaction can contain multiple mtr instances. After each mtr is complete, copy the locally generated logs to the public buffer, and place the modified dirty pages on the flush list.

The class corresponding to the mtr transaction ismtr_t,mtr_t::ImplInformation about the current mtr is saved, including:

Variable name	Description
Mtr_buf_t m_memo	Used to store the lock type held by the mtr
Mtr_buf_t m_log	Store redo log records
Bool m_made_dirty	Whether at least one dirty page is generated
Bool m_inside_ibuf	Whether the change buffer operation is in progress
Bool m_modifications	Whether the buffer pool page has been modified
Ib_uint32_t m_n_log_recs	Number of mtr log records
Mtr_log_t m_log_mode	MTR_LOG_ALL: default mode, which records all operations that modify disk data. MTR_LOG_NONE: Record No redo, dirty pages are not stored on the flush list, And MTR_LOG_NO_REDO: redo records are not recorded, but dirty pages are put on the flush list; MTR_LOG_SHORT_INSERTS: insert REDO records, which are used to copy records from one page to another new page, in this case, the write index information is ignored in the redo log. (Refer to the page_cur_insert_rec_write_log function)
Fil_space_t * m_user_space	User tablespace modified by current mtr
Fil_space_t * m_undo_space	Undo tablespace modified by the current mtr
Fil_space_t * m_sys_space	System tablespace modified by the current mtr
Mtr_state_t m_state	There are four statuses: MTR_STATE_INIT, MTR_STATE_COMMITTING, and MTR_STATE_COMMITTED.

When modifying or reading data in a data file, mtr is generally used to control the locking of the corresponding page or index tree. In 5.7, there are several lock types (mtr_memo_type_t):

Variable name	Description
MTR_MEMO_PAGE_S_FIX	Used for the S lock on the PAGE
MTR_MEMO_PAGE_X_FIX	Used for the X lock on the PAGE
MTR_MEMO_PAGE_SX_FIX	Used for the SX lock on the PAGE. The above lock is saved to mtr through mtr_memo_push.
MTR_MEMO_BUF_FIX	No read/write lock is applied to the PAGE, and only the buf fix is used.
MTR_MEMO_S_LOCK	S lock, usually used for index lock
MTR_MEMO_X_LOCK	X lock, usually used for index lock
MTR_MEMO_SX_LOCK	SX lock, usually used for index lock. The above three locks are locked through mtr_s/x/sx_lock and released through mtr_memo_release.

Mtr log generation

The redo log of InnoDB is generated by mtr. It is written to the mtr cache and then submitted to the public buffer. This section takes the modification of page generated by inserting a record as an example, describes the typical life cycle of an mtr.

Entry function:row_ins_clust_index_entry_low

Enable mtr

Execute the following code block

mtr_start(&mtr);mtr.set_named_space(index->space);

Mtr_start mainly includes:

Initialize various status variables of mtr
The default mode is MTR_LOG_ALL, which indicates that all data changes are recorded.
Mtr status is set to ACTIVE (MTR_STATE_ACTIVE)
Initializes the memory (mtr_buf_t) and object linked list for Lock management objects and log management objects.

mtr.set_named_spaceIs the new logic of 5.7. It will change the currently modified tablespace objectfil_space_tSave: if it is a system tablespace, it is assignedm_impl.m_sys_spaceOtherwise, assign the valuem_impl.m_user_space.

Tips: the temporary table is optimized in 5.7, And the redo record is directly disabled:
Mtr. set_log_mode (MTR_LOG_NO_REDO)

Locate the location where the record is inserted

Main Entry functions:btr_cur_search_to_nth_level

The insert or update operations are performed in an optimistic manner, so the index S lock is applied first.
mtr_s_lock(dict_index_get_lock(index),&mtr), Correspondingmtr_t::s_lockFunction
If you insert records in a pessimistic manner, it means that index split may occur. Before 5.7, the index X lock will be applied, version 5.7 will apply the SX lock (but in some cases it will also degrade to the X lock)
Apply the X lock:mtr_x_lock(dict_index_get_lock(index), mtr), Correspondingmtr_t::x_lockFunction
Apply the SX lock:mtr_sx_lock(dict_index_get_lock(index),mtr), Correspondingmtr_t::sx_lockFunction

Corresponding to the internal implementation, in fact, is to add the corresponding lock object, and then build the pointer and type of the lockmtr_memo_slot_tInsert objectmtr.m_impl.m_memo.

When you find the block corresponding to the pre-inserted page, you also need to apply the block lock and add the corresponding lock type to mtr:mtr_memo_push(mtr, block, fix_type)

If the MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX lock is applied to the page and the current block is cleanm_impl.m_made_dirtySet to true, indicating that a clean page will be modified.

If the lock type is MTR_MEMO_BUF_FIX, it actually does not lock objects, but you need to judge the scenario of the temporary table. The modification of the temporary table page does not contain latch, but you needm_impl.m_made_dirtySet to true (based on the block Memberm_impl.m_made_dirty), Which is also an optimization of 5.7 for the InnoDB temporary table scenario.

Similarly, it is constructed based on the lock type and Lock Object.mtr_memo_slot_tAddm_impl.m_memo.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL & #183; Engine Features & #183; InnoDB redo log roaming

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support