MySQL & #183; Engine Features & #183; InnoDB redo log roaming

Source: Internet
Author: User

MySQL Engine Features-InnoDB redo log roaming
Preface

InnoDB has two very important logs: undo log and redo log. The former is used to ensure the atomicity of transactions and the MVCC of InnoDB, and the latter is used to ensure transaction persistence.

Like most relational databases, InnoDB records physical changes to data files and ensures that logs are always logged first, that is, the so-called WAL, before the persistent data files, ensure that the previous redo logs have been written to the disk.

The LSN (log sequence number) is used to record the log sequence number. It is an increasing unsigned long integer. In the log system of InnoDB, LSN is everywhere. It is used to indicate the log sequence number when a dirty page is modified, and also to record the checkpoint, you can locate the location in the redo log file.

To manage dirty pages, each instance in the Buffer Pool maintains a flush list. pages on the flush list are sorted by the LSN of the modified pages. Therefore, when redo checkpoint is regularly performed, the selected LSN is always the oldest page (with the smallest LSN) on the flush list of all bp instances ). Because the WAL policy is adopted, each transaction commit requires a persistent redo log to ensure that the transaction is not lost. The delayed dirty page refreshing results in merging multiple modifications to avoid performance problems caused by frequent data file writing.

Because the features of the InnoDB log group have been discarded (multiple copies of redo logs are written), the InnoDB archive log feature is also completely removed in 5.7, this document ignores the related logic. In addition, due to space limitations, the InnoDB crash recovery logic will be described in the next phase. This article focuses on the lifecycle of redo logs and some improvements to MySQL 5.7.

This article is based on the latest MySQL 5.7.7-RC version.

MySQL InnoDB table-Basic BTree Data Structure

Optimization of the count (*) function in the InnoDB Storage engine of MySQL

MySQL InnoDB Storage engine lock mechanism Experiment

Startup, shutdown, and restoration of the InnoDB Storage Engine

MySQL InnoDB independent tablespace Configuration

Architecture of MySQL Server layer and InnoDB Engine Layer

InnoDB deadlock Case Analysis

MySQL Innodb independent tablespace Configuration

InnoDB log files

The redo log of InnoDB can use parametersinnodb_log_files_in_groupConfigure multiple files with another parameterinnodb_log_file_sizeThe size of each file. Therefore, the total redo log size isinnodb_log_files_in_group * innodb_log_file_size.

The Redo log file usesib_logfile[number]The log directory can be named by Parametersinnodb_log_group_home_dirControl. Redo logs are written to the file in sequence. When the file is full, it is traced back to the first file for overwrite. (However, when performing redo checkpoint, it will also update the header checkpoint mark of the first log file, so strictly speaking, it is not sequential writing ).

Inside InnoDB, logicallyib_logfileIt is treated as a file and corresponds to the same space id. Because a 512-byte block is used to write data to a file, you can conveniently calculate the file to be written and the corresponding offset based on the globally maintained LSN.

Redo log files are written cyclically. Before overwriting, ensure that the corresponding dirty pages have been flushed to the disk. Under a very large load, Redo logs may generate very fast speeds, resulting in frequent dirty operations, leading to performance degradation, generally, after the log with no checkpoint exceeds 76% of the total file size, InnoDB deems this may be an insecure point and will forcibly preflush dirty pages, resulting in a large number of user threads stall. If this is expected, we recommend that you increase the size of the redo log file. You can perform a clean shutdown, modify the Redo log configuration, and restart the instance.

In addition to the redo log file, InnoDB also has other log files, such as intermediate log files generated to ensure the truncate operation, including the truncate innodb table and truncate undo log tablespace, an intermediate file is generated to identify whether the operations are successful or failed. If the truncate is not completed, redo the operations during crash recovery. Interestingly, according to the official worklog description, the truncate operation was initially atomic by adding a new redo log type, but I don't know why I changed it to the log file method later. Maybe it's because of the compatibility of earlier versions.

Key struct log_sys object

log_sysIt is the hub and core object of the InnoDB log system and controls core functions such as log copying, writing, and checkpoint. It is also a hotspot module for large write load scenarios. It is the hub connecting InnoDB log files and log buffer, and the corresponding struct islog_t.

The member variables related to the redo log file include:

Variable name Description
Log_groups Log group. The current version only supports one set of logs. The corresponding type islog_group_tContains the number of files in the current log group, the size of each file, and the space id.
Lsn_t log_group_capacity Indicates the total capacity of the current log file. The value is: (total size of the Redo log file-Number of redo files * LOG_FILE_HDR_SIZE) * 0.9, and LOG_FILE_HDR_SIZE is 4*512 bytes.
Lsn_t max_modified_age_async Asynchronous preflush dirty page point
Lsn_t max_modified_age_sync Synchronize preflush dirty page points
Lsn_t max_checkpoint_age_async Asynchronous checkpoint
Lsn_t max_checkpoint_age Synchronize checkpoints

For the calculation methods of the above several sync/async points, refer to the Functionlog_calc_max_agesTake the following instance configuration as an example:

Innodb_log_files_in_group = 4innodb_log_file_size = 4g Total file size: 17179869184

Variable values of each member and the proportion to the total file size:

log_sys->log_group_capacity = 15461874893 (90%)log_sys->max_modified_age_async = 12175607164 (71%)log_sys->max_modified_age_sync = 13045293390 (76%)log_sys->max_checkpoint_age_async = 13480136503 (78%)log_sys->max_checkpoint_age = 13914979615 (81%)

Generally:

The distance between the oldest lsn before Dangdang and the current lsn is greatermax_modified_age_async(71%), and the option is enabledinnodb_adaptive_flushingThe page cleaner thread will try to do more dirty page flush work to avoid dirty page accumulation.
The distance between the oldest lsn before Dangdang and the current Lsn is greatermax_modified_age_sync(76%), the user thread needs to do synchronous dirty brush, this is a critical point of performance decline, will greatly affect the overall throughput and response time.
When the lsn of the last checkpoint and the current lsn exceedmax_checkpoint_age(81%). The user thread needs to perform a checkpoint synchronously and wait until the checkpoint write is completed.
When the distance between the lsn of the last checkpoint and the current lsn exceedsmax_checkpoint_age_async(78%) but lessmax_checkpoint_age(81%), the user thread performs an asynchronous checkpoint (the background asynchronous thread executes the CHECKPOINT information write operation), without waiting for the checkpoint to complete.

log_group_tThe main structure members are shown in the following table:

Variable name Description
Ulint n_files Number of Ib_logfile files
Lsn_t file_size File Size
Ulint space_id Space id of the Redo log, fixed size, value: SRV_LOG_SPACE_FIRST_ID
Ulint state LOG_GROUP_ OK or log_group_upted
Lsn_t lsn Lsn written in the group
Lsn_t lsn_offset File offset corresponding to the above lsn
Byte ** file_header_bufs Buffer area, used to set the log file header information and write it to ib logfile. When you switch to a new ib_logfile, update the starting lsn of the file and write it to the header. The header information also contains: LOG_GROUP_ID, LOG_FILE_START_LSN (Starting lsn of the current file), LOG_FILE_WAS_CREATED_BY_HOT_BACKUP (function: log_group_file_header_flush)
Lsn_t scanned_lsn Used to assist in recording the scanned lsn during crash recovery
Byte * checkpoint_buf Checkpoint buffer, used to write checkpoint information to log files (detailed description below)

The member variables related to the redo log memory buffer include:

Variable name Description
Ulint buf_free Currently idle writable location in Log buffer
Byte * buf Log buffer start position pointer
Ulint buf_size The Log buffer size, which is controlled by the innodb_log_buffer_size parameter, but may be automatically extend.
Ulint max_buf_free The value is log_sys-> buf_size/LOG_BUF_FLUSH_RATIO-LOG_BUF_FLUSH_MARGIN, of which: Limit = 2, LOG_BUF_FLUSH_MARGIN = (4*512 + 4 * page_size), and page_size is 16 k by default, when buf_free exceeds this value, the user thread may be triggered to write redo. After the transaction copies the redo to the buffer, the user thread will also judge this value. If the value exceeds buf_free, set log_sys-> check_flush_or_checkpoint to true.
Ulint buf_next_to_write Log buffer offset. the starting position of the next redo file write, that is, the end position of the current write.
Volatile bool is_extending Whether the Log buffer is being extended (to prevent too many redo log entries from being written to the buffer). In fact, when the length of the written redo log exceeds buf_size/2, it will call the log_buffer_extend function, once the Buffer is extended, it will not be reduced back!
Ulint write_end_offset Offset of the end position of the write operation (logically redundant, use log_sys-> buf_free)

Member variables related to the Checkpoint:

Variable name Description
Ib_uint64_t next_checkpoint_no This value is incremented every time the checkpoint is completed.
Lsn_t last_checkpoint_lsn The lsn of the last checkpoint. Each checkpoint is completed, the value of next_checkpoint_lsn is assigned to last_checkpoint_lsn.
Lsn_t next_checkpoint_lsn The lsn of the next checkpoint (the lsn of the checkpoint initiated this time)
Mtr_buf_t * append_on_checkpoint 5.7. When DDL is performed (for example, adding or deleting columns), The buf containing the MLOG_FILE_RENAME2 log is first mounted to this variable. After the DDL operation is complete, clear it again. (Log_append_on_checkpoint), mainly to prevent data dictionary inconsistency generated by crash during DDL. This variable is added to the following commit: a5ecc38f44abb66aa2024c70e37d1f4aa4c8ace9
Ulint n_pending_checkpoint_writes If the value is greater than 0, a checkpoint write operation is in progress. This value is incremented when the user initiates a checkpoint. This value (log_io_complete) is decreased after the background thread completes checkpoint writing)
Rw_lock_t checkpoint_lock Checkpoint lock. The x lock is required each time the checkpoint information is written. The x lock is released by an asynchronous io thread.
Byte * checkpoint_buf Checkpoint information buffer. Before each checkpoint, write the buf and then fl the buf to the disk.

Other status variables

Variable name Description
Bool check_flush_or_checkpoint When this variable is set, the user thread may need to check and release log buffer, or perform preflush and checkpoint to prevent insufficient Redo space.
Lsn_t write_lsn The last LSN written to the file
Lsn_t current_flush_lsn LSN to which fsync is currently in progress
Lsn_t flushed_to_disk_lsn Last time fsync was completed to the LSN of the file
Ulint n_pending_flushes Indicates the redo fsync of pending. The maximum value is 1.
OS _event_t flush_event If fsync is in progress and the request is also fsync, you need to wait until the last fsync operation is completed.

The relationship between log_sys and log files and the log buffer can be expressed as follows:

Mini transaction

Mini transaction (mtr) is the smallest transaction Unit for InnoDB to operate physical data files. It is used to manage Page locking, modification, release, and commit logs to the public buffer. An mtr operation must be atomic. A transaction can contain multiple mtr instances. After each mtr is complete, copy the locally generated logs to the public buffer, and place the modified dirty pages on the flush list.

The class corresponding to the mtr transaction ismtr_t,mtr_t::ImplInformation about the current mtr is saved, including:

Variable name Description
Mtr_buf_t m_memo Used to store the lock type held by the mtr
Mtr_buf_t m_log Store redo log records
Bool m_made_dirty Whether at least one dirty page is generated
Bool m_inside_ibuf Whether the change buffer operation is in progress
Bool m_modifications Whether the buffer pool page has been modified
Ib_uint32_t m_n_log_recs Number of mtr log records
Mtr_log_t m_log_mode MTR_LOG_ALL: default mode, which records all operations that modify disk data. MTR_LOG_NONE: Record No redo, dirty pages are not stored on the flush list, And MTR_LOG_NO_REDO: redo records are not recorded, but dirty pages are put on the flush list; MTR_LOG_SHORT_INSERTS: insert REDO records, which are used to copy records from one page to another new page, in this case, the write index information is ignored in the redo log. (Refer to the page_cur_insert_rec_write_log function)
Fil_space_t * m_user_space User tablespace modified by current mtr
Fil_space_t * m_undo_space Undo tablespace modified by the current mtr
Fil_space_t * m_sys_space System tablespace modified by the current mtr
Mtr_state_t m_state There are four statuses: MTR_STATE_INIT, MTR_STATE_COMMITTING, and MTR_STATE_COMMITTED.

When modifying or reading data in a data file, mtr is generally used to control the locking of the corresponding page or index tree. In 5.7, there are several lock types (mtr_memo_type_t):

Variable name Description
MTR_MEMO_PAGE_S_FIX Used for the S lock on the PAGE
MTR_MEMO_PAGE_X_FIX Used for the X lock on the PAGE
MTR_MEMO_PAGE_SX_FIX Used for the SX lock on the PAGE. The above lock is saved to mtr through mtr_memo_push.
MTR_MEMO_BUF_FIX No read/write lock is applied to the PAGE, and only the buf fix is used.
MTR_MEMO_S_LOCK S lock, usually used for index lock
MTR_MEMO_X_LOCK X lock, usually used for index lock
MTR_MEMO_SX_LOCK SX lock, usually used for index lock. The above three locks are locked through mtr_s/x/sx_lock and released through mtr_memo_release.
Mtr log generation

The redo log of InnoDB is generated by mtr. It is written to the mtr cache and then submitted to the public buffer. This section takes the modification of page generated by inserting a record as an example, describes the typical life cycle of an mtr.

Entry function:row_ins_clust_index_entry_low

Enable mtr

Execute the following code block

mtr_start(&mtr);mtr.set_named_space(index->space);

Mtr_start mainly includes:

  1. Initialize various status variables of mtr

  2. The default mode is MTR_LOG_ALL, which indicates that all data changes are recorded.

  3. Mtr status is set to ACTIVE (MTR_STATE_ACTIVE)

  4. Initializes the memory (mtr_buf_t) and object linked list for Lock management objects and log management objects.

mtr.set_named_spaceIs the new logic of 5.7. It will change the currently modified tablespace objectfil_space_tSave: if it is a system tablespace, it is assignedm_impl.m_sys_spaceOtherwise, assign the valuem_impl.m_user_space.

Tips: the temporary table is optimized in 5.7, And the redo record is directly disabled:
Mtr. set_log_mode (MTR_LOG_NO_REDO)

Locate the location where the record is inserted

Main Entry functions:btr_cur_search_to_nth_level

The insert or update operations are performed in an optimistic manner, so the index S lock is applied first.
mtr_s_lock(dict_index_get_lock(index),&mtr), Correspondingmtr_t::s_lockFunction
If you insert records in a pessimistic manner, it means that index split may occur. Before 5.7, the index X lock will be applied, version 5.7 will apply the SX lock (but in some cases it will also degrade to the X lock)
Apply the X lock:mtr_x_lock(dict_index_get_lock(index), mtr), Correspondingmtr_t::x_lockFunction
Apply the SX lock:mtr_sx_lock(dict_index_get_lock(index),mtr), Correspondingmtr_t::sx_lockFunction

Corresponding to the internal implementation, in fact, is to add the corresponding lock object, and then build the pointer and type of the lockmtr_memo_slot_tInsert objectmtr.m_impl.m_memo.

When you find the block corresponding to the pre-inserted page, you also need to apply the block lock and add the corresponding lock type to mtr:mtr_memo_push(mtr, block, fix_type)

If the MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX lock is applied to the page and the current block is cleanm_impl.m_made_dirtySet to true, indicating that a clean page will be modified.

If the lock type is MTR_MEMO_BUF_FIX, it actually does not lock objects, but you need to judge the scenario of the temporary table. The modification of the temporary table page does not contain latch, but you needm_impl.m_made_dirtySet to true (based on the block Memberm_impl.m_made_dirty), Which is also an optimization of 5.7 for the InnoDB temporary table scenario.

Similarly, it is constructed based on the lock type and Lock Object.mtr_memo_slot_tAddm_impl.m_memo.

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.