Redo Log Write and flush

Source: Internet
Author: User

Http://bbs.chinaunix.net/thread-1753130-1-1.html

When a transaction commits, Innobase calls Innobase_commit in ha_innodb.cc, and Innobase_commit calls Trx_commit_complete_for_mysql (TRX0TRX.C) To call Log_write_up_to (LOG0LOG.C), that is, when Innobase commits the transaction, Log_write_up_to is called to write redo log.
In Innobase_commit

    1. if (all # if the transaction is committed
    2. || (!thd_test_options (THD, Option_not_autocommit | Option_begin)) {
Copy Code

Commit serialization of a transaction through the following code

  1. if (innobase_commit_concurrency > 0) {
  2. Pthread_mutex_lock (&commit_cond_m);
  3. commit_threads++;
  4. if (Commit_threads > Innobase_commit_concurrency) {
  5. commit_threads--;
  6. Pthread_cond_wait (&commit_cond,
  7. &COMMIT_COND_M);
  8. Pthread_mutex_unlock (&commit_cond_m);
  9. Goto retry;
  10. }
  11. else {
  12. Pthread_mutex_unlock (&commit_cond_m);
  13. }
  14. }
Copy Code
    1. Trx->flush_log_later = TRUE; # Prohibit flush Binlog to disk when doing commit operation
    2. Innobase_commit_low (TRX);
    3. Trx->flush_log_later = FALSE;
Copy Code

Skip the Innobase_commit_low call first, and start calling Trx_commit_complete_for_mysql to do the write log operation

    1. Trx_commit_complete_for_mysql (TRX); #开始flush Log
    2. Trx->active_trans = 0;
Copy Code

In Trx_commit_complete_for_mysql, the main thing is to judge the system parameter Srv_flush_log_at_trx_commit value to call
Log_write_up_to, or write redo log file or Write&&flush to disk

  1. if (!trx->must_flush_log_later) {
  2. /* do nothing */
  3. } else if (Srv_flush_log_at_trx_commit = = 0) {#flush_log_at_trx_commit = 0, transaction commit does not write redo log
  4. /* do nothing */
  5. } else if (Srv_flush_log_at_trx_commit = = 1) {#flush_log_at_trx_commit = 1, transaction commits write log and flush disk if flush mode is not srv_unix_nosync (This is not very familiar)
  6. if (Srv_unix_file_flush_method = = Srv_unix_nosync) {
  7. /* Write the log but don't flush it to disk */
  8. Log_write_up_to (LSN, Log_wait_one_group, FALSE);
  9. } else {
  10. /* Write The log to the log files and flush them to
  11. Disk */
  12. Log_write_up_to (LSN, Log_wait_one_group, TRUE);
  13. }
  14. } else if (Srv_flush_log_at_trx_commit = = 2) {#如果是2, write only to redo log
  15. /* Write the log but don't flush it to disk */
  16. Log_write_up_to (LSN, Log_wait_one_group, FALSE);
  17. } else {
  18. Ut_error;
  19. }
Copy Code

Then look at log_write_up_to.

  1. if (Flush_to_disk #如果flush到磁盘, compares whether the LSN of the current commit is greater than the LSN that has been flush to disk
  2. && ut_dulint_cmp (LOG_SYS->FLUSHED_TO_DISK_LSN, LSN) >= 0) {
  3. Mutex_exit (& (Log_sys->mutex));
  4. Return
  5. }
  6. if (!flush_to_disk #如果不flush磁盘则比较当前commit的lsn是否大于已经写到所有redo the LSN of the log file, or if only one of the group completion conditions is greater than the LSN that has been written to a redo file
  7. && (ut_dulint_cmp (LOG_SYS->WRITTEN_TO_ALL_LSN, LSN) >= 0
  8. || (UT_DULINT_CMP (LOG_SYS->WRITTEN_TO_SOME_LSN, LSN)
  9. >= 0
  10. && wait! = log_wait_all_groups))) {
  11. Mutex_exit (& (Log_sys->mutex));
  12. Return
  13. }
  14. #下面的代码判断是否log在write, some words await their completion
  15. if (Log_sys->n_pending_writes > 0) {
  16. if (Flush_to_disk # if required to flush to disk, if the LSN being flushed includes a commit LSN, just wait for the operation to complete
  17. && ut_dulint_cmp (LOG_SYS->CURRENT_FLUSH_LSN, LSN)
  18. >= 0) {
  19. Goto Do_waits;
  20. }
  21. if (!flush_to_disk # if it is brushed to redo log file then if the LSN of the commit is included in the write LSN, just wait.
  22. && ut_dulint_cmp (LOG_SYS->WRITE_LSN, LSN) >= 0) {
  23. Goto Do_waits;
  24. }
  25. ......
  26. if (!flush_to_disk # If in the current IO idle case and does not need to flush to disk, then if the next write location has reached Buf_free location Description Wirte operation has been completed, directly return
  27. && Log_sys->buf_free = = log_sys->buf_next_to_write) {
  28. Mutex_exit (& (Log_sys->mutex));
  29. Return
  30. }
Copy Code

The following takes the group, sets the relevant write or flush related fields, and gets the block number of the starting and ending positions

  1. log_sys->n_pending_writes++;
  2. Group = Ut_list_get_first (log_sys->log_groups);
  3. group->n_pending_writes++; /* We assume here and we have only
  4. One log group! */
  5. Os_event_reset (log_sys->no_flush_event);
  6. Os_event_reset (log_sys->one_flushed_event);
  7. Start_offset = log_sys->buf_next_to_write;
  8. End_offset = log_sys->buf_free;
  9. Area_start = Ut_calc_align_down (Start_offset, os_file_log_block_size);
  10. Area_end = Ut_calc_align (End_offset, os_file_log_block_size);
  11. Ut_ad (Area_end-area_start > 0);
  12. LOG_SYS->WRITE_LSN = log_sys->lsn;
  13. if (Flush_to_disk) {
  14. LOG_SYS->CURRENT_FLUSH_LSN = log_sys->lsn;
  15. }
Copy Code

Log_block_set_checkpoint_no Call Set End_offset the log_block_checkpoint_no of the block is the next checkpoint number in Log_sys.

    1. Log_block_set_flush_bit (Log_sys->buf + Area_start, TRUE); # This doesn't look clear
    2. Log_block_set_checkpoint_no (
    3. Log_sys->buf + area_end-os_file_log_block_size,
    4. LOG_SYS->NEXT_CHECKPOINT_NO);
Copy Code

Save data that is not part of End_offset but in its block to the next free block

    1. ut_memcpy (Log_sys->buf + area_end,
    2. Log_sys->buf + area_end-os_file_log_block_size,
    3. Os_file_log_block_size);
Copy Code

For each group call Log_group_write_buf write redo log buffer

  1. while (group) {
  2. Log_group_write_buf (
  3. Group, Log_sys->buf + Area_start,
  4. Area_end-area_start,
  5. Ut_dulint_align_down (LOG_SYS->WRITTEN_TO_ALL_LSN,
  6. Os_file_log_block_size),
  7. Start_offset-area_start);
  8. Log_group_set_fields (group, LOG_SYS->WRITE_LSN); # Calculate the LSN and offset for this write to set GROUP->LSN and Group->lsn_offset
  9. Group = Ut_list_get_next (log_groups, group);
  10. }
  11. ......
  12. if (Srv_unix_file_flush_method = = Srv_unix_o_dsync) {# What's this stuff?
  13. /* O_dsync means the OS did not buffer the log file at all:
  14. So we had also flushed to disk and we have written */
  15. LOG_SYS->FLUSHED_TO_DISK_LSN = log_sys->write_lsn;
  16. } else if (Flush_to_disk) {
  17. Group = Ut_list_get_first (log_sys->log_groups);
  18. Fil_flush (group->space_id); # Last Call Fil_flush execution flush to disk
  19. LOG_SYS->FLUSHED_TO_DISK_LSN = log_sys->write_lsn;
  20. }
Copy Code

Next thing you see, Log_group_write_buf did something.

In Log_group_calc_size_offset, the LSN of the last record is taken from the group (note that it is a 1-ring buffer of log files) and the LSN is calculated relative to the last difference

  1. # call Log_group_calc_size_offset to calculate Group->lsn_offset to remove the size of multiple log_file head lengths, such as Lsn_offset falls on the 3rd LOG file, then need to subtract 3*log_ Size of the File_hdr_size
  2. Gr_lsn_size_offset = (Ib_longlong)
  3. Log_group_calc_size_offset (Group->lsn_offset, group);
  4. Group_size = (ib_longlong) log_group_get_capacity (group); # calculates the size of the data portion of the group after all log_file_hdr_size lengths are removed
  5. # below is a typical differential calculation for ring structures
  6. if (ut_dulint_cmp (LSN, GR_LSN) >= 0) {
  7. difference = (Ib_longlong) ut_dulint_minus (LSN, GR_LSN);
  8. } else {
  9. difference = (Ib_longlong) ut_dulint_minus (GR_LSN, LSN);
  10. difference = difference% Group_size;
  11. difference = group_size-difference;
  12. }
  13. Offset = (gr_lsn_size_offset + difference)% Group_size;
  14. # finally count each log file header size, return the real offset
  15. Return (Log_group_calc_real_offset ((ulint) offset, group));
Copy Code

Then look

  1. # If you need to write more than one file size
  2. if ((next_offset% group->file_size) + len > Group->file_size) {
  3. Write_len = group->file_size # writes to the end of file
  4. -(next_offset% group->file_size);
  5. } else {
  6. Write_len = Len; # no one writes Len a block
  7. }
  8. # Finally the real content is to write buffer, if you cross file, you need to write the file log file head part
  9. if ((next_offset% group->file_size = = log_file_hdr_size)
  10. && Write_header) {
  11. /* We start to write a new log file instance in the group */
  12. Log_group_file_header_flush (Group,
  13. Next_offset/group->file_size,
  14. START_LSN);
  15. srv_os_log_written+= os_file_log_block_size;
  16. srv_log_writes++;
  17. }
  18. # call Fil_io to execute buffer write
  19. if (log_do_write) {
  20. log_sys->n_log_ios++;
  21. srv_os_log_pending_writes++;
  22. Fil_io (Os_file_write | Os_file_log, TRUE, group->space_id,
  23. Next_offset/univ_page_size,
  24. Next_offset% univ_page_size, Write_len, buf, group);
  25. srv_os_log_pending_writes--;
  26. srv_os_log_written+= Write_len;
  27. srv_log_writes++;
  28. }
Copy Code

Redo Log Write and flush

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.