Redo logs in MySQL

Source: Internet
Author: User

Redo logs are used to achieve the persistence of transactions, i.e. d in acid, consisting of two parts:

One is the in-memory redo log buffer (redo log buffer) easily lost

The second is Redo log file (redo log files) persistent

InnoDB is the storage engine for transactions, which implements the persistence of a transaction through the force Log at commit mechanism, that is, when a transaction commits a commit, all logs of the transaction must be written to the redo log file for persistence before the transaction commit is completed. Here the log refers to the redo log, in the InnoDB storage engine, consists of two parts, namely redo log and undo log. Redo log is used to ensure the persistence of transactions, and undo log is used to master transaction rollback and MVCC functions. Redo log is basically sequential and does not require a read operation to the Redo log file while the database is running. The Undo log is required for random read and write

To ensure that each log can be written to the log file, the InnoDB storage engine needs to invoke the Fsync operation each time the redo log buffer is written to the redo log file, and the redo log buffer is written to the file system cache first because the Redo log file is open and does not use the O_direct option. In order to ensure that redo logs are written to disk, Fsync must be done. Because the efficiency of the fsync depends on the performance of the disk, the performance of the disk determines the performance of the transaction submission, which is the performance of the database

The InnoDB storage engine improves the performance of the database by allowing the user to manually set non-persistent conditions to occur. That is, when a transaction commits, the log is not written to the redo log, but waits for a time period after the Fsync operation is performed. Since it is not mandatory to do a fsync operation at the time of transaction commit, it is obvious that this can improve the performance of the database, but when the database goes down, the last segment of the transaction is lost because some logs are not secure to disk

The parameter innodb_flush_log_at_trx_commit is used to control the redo log flush to disk policy, the parameter defaults to 1, indicating that the transaction commits must call a fsync operation, can also be set to 0 and 2, 0 means that the transaction commits without writing redo log operations, This operation is done only in the master thread, while the fsync operation of the redo log file is performed once per second in the master thread, and 2 indicates that the redo log is written to the redo log file when the transaction commits, but only to the file system's cache and not to the fsync operation. In this setting, when the MySQL outage occurs and the operating system does not go down, it does not cause the loss of the transaction, and when the operating system goes down, restarting the database loses the part of the transaction that is not flushed from the file system cache to the redo log file

Refer to an innodb_flush_log_at_trx_commit to compare the impact on transactions. First create the table t1 and stored procedures based on the following code p_load

CREATE TABLETest_load (AINT, bCHAR( the) ) ENGINE=Innodb;delimiter//CREATE PROCEDUREP_load (COUNT INTUNSIGNED)BEGINDECLARESINTUNSIGNEDDEFAULT 1;DECLARECCHAR( the)DEFAULTREPEAT ('a', the); whileS<= COUNT DoINSERT  intoTest_loadSELECT NULL, C;COMMIT;SETS=S+1;END  while;END;//DELIMITER;

The role of the stored procedure P_load is to insert the data into the table test_load continuously, and to display the commit operation one at a time, under the default setting, the parameter innodb_flush_log_at_trx_commit is 1. InnoDB writes the redo log buffer to the file and invokes a fsync operation, and if you execute call p_load (500 000), a record of the 50W row is inserted into the table, the Fsync operation is performed 50W times, and the time required to insert the 50W record by default is first seen

Call P_load (500000);

Virtual machine, inserting 50W records, the cost of 18 minutes, for the production environment, time is certainly unacceptable, and the time is more greedy because of the time required to Fsync, then set the parameters to

SET GLOBAL  innodb_flush_log_at_trx_commit=0;

Re-execute

Call P_load (500000); Total time 40 seconds

You can see that when the parameter innodb_flush_log_at_trx_commit is set to 0, the records inserted into the 50W row are shortened by nearly 17 minutes. The main reason for this phenomenon is that the latter significantly reduces the number of fsync, which makes the performance of database execution much higher, as shown in the table below Innodb_flush_log_at_trx_commit the different settings of the call stored procedure p_load insert 50W row record time

Although the user can set the parameter innodb_flush_log_at_trx_commit to 0 or both to improve the performance of transaction submissions, it is worth remembering that this setting loses the acid nature of the transaction, and for the above stored procedures, in order to improve the performance of transaction submission, A commit action should be made once the 50W row record is inserted into the table, instead of one after a record has been inserted. The advantage of this is that the transaction method can also roll back to the initial state of the transaction

There is also a binary log in the MySQL database which is used for the Point-in-time (PIT) recovery and the establishment of the master-slave Replication (Replication) environment, which is very similar to the redo log on the surface, and is a log of the database operations, however, in essence, There's a very big difference between the two.

First, the redo log is generated by the InnoDB storage engine layer, and the binary log is generated at the top of the MySQL database, and the binary log is not just for the InnoDB storage engine, and any storage engine in the MySQL database will generate a binary log for the database.

Second, the content form of the two log records is different, the MySQL database upper layer of the binary log is a logical log, which records the corresponding SQL statements, and the InnoDB storage engine level of the redo log is a physical format log, which records the changes of each page

In addition, the two log records are written to disk at different points in time, and the binary log is written only once after the transaction commits, while the redo logs of the InnoDB storage engine are continuously written in the transaction, as if the log was not written in the order in which the transaction was committed.

As seen from the diagram, the binary log is logged near the transaction commit, and only one log of the corresponding transaction is included for each transaction, and for the redo log of the InnoDB storage engine, because it records the physical operation log, each transaction corresponds to more than one log entry, and the redo log for the transaction is concurrent, It is not written at the time the transaction commits, so the order in which it is recorded in the file is not the beginning of the transaction. *T1 * T2 *T3 indicates the log at transaction commit

2 log block

In the InnoDB storage engine, the redo logs are stored in 512 bytes, which means that the log cache is focused, and the redo log file blocks are saved as block blocks, called Redo log blocks (redo log block) with a size of 512 bytes per block

The number of redo logs generated per page is greater than 512 bytes, so you need to split multiple redo log blocks for storage, and because the redo log is as fast as the size of the disk sector is 512 bytes, so the redo log writes can guarantee atomicity, do not need double write technology

In addition to the log itself, the redo log is made up of the log block header and the tail of log blocks tailer. The redo log header occupies a total of 12 bytes, and the redo log tail occupies 8 bytes. Therefore, each redo log block can actually store a size of 492 bytes (512-12-8), showing the structure of the redo log block cache

Shows the redo log cache results, you can find that the redo log cache consists of each 512-byte size of the log block locks, the log block consists of three parts, the log is the fast header (log block header), the log content (log body), the log block tail (tailer )

Log block header consists of 4 parts

The log buffer is made up of log blocks, and inside the log buffer is like an array, so log_block_hdr_no is used to mark the position in the array, especially incrementing and iterating. Occupies 4 bytes. But since the first bit is used to determine if the flush bit, the maximum value is 2G

Log_block_hdr_data_len occupies 2 bytes, which indicates the size occupied by log block, and when log block is full, the value is 0x200, which means that all log block space is used, which is 512 bytes

The Log_block_first_rec_group occupies 2 bytes, which represents the offset of the first log in the log BLOCK. If the value is the same size as Log_block_hdr_data_len, the current log BLOCK does not contain new logs. For example, transaction T1 redo log 1 occupies 762 bytes, transaction T2 redo log occupies 100 bytes. Since each log block can actually hold only 492 bytes, its case in log buffer should be

As you can see from the graph, the redo log for transaction T1 occupies 792 bytes, so it takes up to two log blocks. In the log block on the left, Log_block_first_rec_group is 12, the first log in the level log block, in the second log block, the log of the transaction T2 because it contains the redo log of the previous transaction T1 The first log in the block, so the log block is Log_block_first_rec_group (270+12)

The Log_block_checkpoint_no occupies 4 bytes, indicating the value of the 4th byte of the checkpoint when the log BLOCK was last written.

Log block tailer consists of only 1 parts, and the value is the same as Log_block_hdr_no, and is initialized in the function log_block_init log_block_trl_no size is 4 bytes

3 Log Group

The log group redo log groups, which have multiple redo log files, although the source code already supports the scene feature of the log group, but is forbidden in the ha_innobase.cc file, so the InnoDB storage engine is actually only a log group

The log group is a logical concept and does not have an actual physical file to represent the log group information, and the log group consists of several redo log files, each of which is identical, and before the InnoDB 1.2 version, The total size of the redo log file is less than 4GB, and the total size limit for redo log files starting from InnoDB 1.2 is increased to 512gb,innosql version of the InnoDB storage engine supports redo logs greater than 4GB in version 1.1

The redo log file is stored in the log block that was saved in the previous log buffer. Therefore it is also the management of physical storage according to the block, each block is the same size as log block, the same as 512 bytes, in the InnoDB storage engine running process, log buffer according to certain rules will be in memory log block flush to disk. This rule is specifically

When a transaction commits

When half of the memory space in log buffer is already in use

When log checkpoint

The write to log block is appended to the last section of the Redo log file, and when a redo log file is full, the next redo log file is written with the Round-robin

Although the log block is always written in the last part of the redo log file, some readers may assume that the redo log file is written in the order that it is not, because redo log file saves the log block of log buffer flushed to the disk, It also holds some other information that takes up a total of 2KB, that is, the first 2KB portion of each redo log file does not save the log block information, and the first redo log file in the log group holds 4 512-byte blocks in the previous 2KB section , where the contents are stored as

It is important to note that the above information is only stored in the first redo log file of each log group, and the remaining redo log file in the log group retains only those spaces, but does not save the above information. Because this information is saved, it means that the write to the redo log file is not in full order. Because of its write operation in addition to the log block, it also needs to update the information in the first 2KB section, which is critical and important for the recovery operation of the InnoDB storage engine, so the relationship between log group and redo log file is as follows

The section behind the log filer header is the checkpoint (checkpoint) value saved by the InnoDB storage Engine, which is written alternately at design time. This design avoids the failure of media to find the available checkpoint

4 Redo Log Format

Different database operations will have a corresponding redo log format. In addition, because the storage management of the InnoDB Storage engine is page-based, the redo log format is also page-based. Although they have different redo log formats, they have a common header format,

The general header format is made up of 3 parts.

Redo_log_type Redo Log Types

Space: Tablespace ID

Offset of the Page_no page

After that is the redo log body, depending on the redo log type is not correct, there will be different storage content, for example, for the record on the page insert and delete operations, respectively, the corresponding format

Redo logs in MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.