Redo and undo log
InnoDB has two very important logs: undo log and redo log. The former is used to ensure the atomicity of transactions and the MVCC of InnoDB, and the latter is used to ensure transaction persistence. Like most relational databases, InnoDB records physical changes to data files and ensures that logs are always logged first, that is, the so-called WAL (Write Ahead Log), that is, before a persistent data file, ensure that the previous redo logs have been written to disk I. Concept 1. Innodb Crash Recovery this is a feature of the InnoDB engine. When a fault occurs and the service is restarted, the Recovery operation is automatically completed, restore the database to the previous normal state (you do not need to redo all the logs, you only need to execute the logs after the last point of refreshing, this point is called Checkpoint) there are two steps in the recovery process. Step 1: Check the redo log and redo all the previously completed and committed transactions. Step 2: redo the undo logs for unfinished transactions, cancel all 2. LSNLSN (log sequence number) is used to record the log sequence number. It is an increasing unsigned long integer. In the log system of InnoDB, LSN is everywhere. It is used to indicate the log sequence number when a dirty page is modified, and also to record the checkpoint, you can locate the location in the redo log file. The LSN is expressed by the byte offset. Each page has an LSN, redo log also has an LSN, and Checkpoint also has an LSN. You can run the show engine innodb status Command to observe: --- LOG --- Log sequence number 33646077360Log flushed up to 33646077360 Last checkpoint at 336460773600 pending log writes, 0 pending chkp writes49687445 log I/o's done, 1.25 log I/o's/second maintain a flush list on each instance of the Buffer Pool to manage dirty pages, pages on the flush list are sorted by the LSN of the modified pages. Therefore, when redo checkpoint is regularly performed, the selected LSN is always the oldest page (with the smallest LSN) on the flush list of all bp instances ). Because the WAL policy is adopted, each transaction commit requires a persistent redo log to ensure that the transaction is not lost. The delayed dirty page refreshing results in merging multiple modifications to avoid performance problems caused by frequent data file writing. 3. In InnoDB, the data written to the bp page can speed up data processing and cause data inconsistency (RAM vs DISK) 1. In every request to a user that causes data changes, the Innodb engine loads the data and indexes into the buffer pool in the memory, if you need to update the data and indexes to the disk each time you modify the data, I/O requests will be greatly increased because the positions of each update are random and the head needs to be located frequently, resulting in low efficiency; the data is temporarily stored in the memory, which also increases the read speed to a certain extent. Therefore, Innodb adds only one log after processing one request (Transaction), and another thread is responsible for Intelligently reading log files and batch updating them to the disk, the most efficient disk write. Since innodb uses Mem buffer to increase the corresponding speed, it will also bring about data inconsistency. The term is dirty data, which mysql calls dirty page. Occurrence process: when a Transaction (Transaction) needs to modify a record (row), InnoDB needs to read the page of the data from disk to the buffer pool. After the Transaction is committed, innoDB modifies the record (row) in the page ). In this case, the page in the buffer pool is different from that in the disk. The data in mem is called the dirty data (dirty page ). 2. When a transaction is committed, the transaction change operation is immediately recorded in the redo log. Therefore, even if the dirty page in the buffer pool is lost during power failure, InnoDB will recover data based on the redo log records at startup. Another function of redo log is, minimize the random writes of the disk by delaying the flush of the dirty page. (Redo log will merge the TRX changes to a page for a period of time) 3. Normally, when is the dirty page flushed to disk? The redo log is a ring structure, when the redo space is full, some dirty pages will be flushed to the disk, and some redo logs will be released. In this case, we can observe it through Innodb_log_wait (show global status). In this case, the counter will automatically increase once. When a page needs to be allocated in the Buffer pool, but it is full, and all pages are dirty (otherwise, the pages without dirty can be released. In this case, flush dirty pages to disk is required. This will be recorded in Innodb_buffer_pool_wait_free. Generally, the innodb_max_dirty_pages_pct parameter can be enabled to control this situation. When the dirty page in the buffer pool reaches this ratio, a checkpoint is forcibly set, when the system is idle, the dirty page is flush to disk. the InnoDB configuration parameters involved in each 64 pages: innodb_flush_log_at_trx_commit, innodb_max_dirty_pages_pct; status parameters: innodb_log_wait, Innodb_buffer_pool_wait_free 4, and dirty page are in the Buffer pool, if the system suddenly loses power, will the data modification in Dirty page be lost? For example, if a user completes an operation (the database completes a transaction, the page has been modified in the buffer pool, but the dirty page has not been flus H), when the system loses power, all the buffer pool data disappears. Will the Operations completed by this user (resulting in database modifications) be lost? The answer is no (innodb_flush_log_at_trx_commit = 1 ). This is what redo log should do. There are two reasons for Database Inconsistency caused by system faults that record updates on disk (the data in the buffer pool is not permanent: uncompleted transactions may have been written to the database. Committed transactions may have been used to update the database. The database may have been left in the buffer zone before being written to the database. Here we will talk about the general restoration method: scan the log file (from start to end), locate the transactions that have been committed before the fault occurs (there are begin transaction and commit records), and redo the records. At the same time, find the unfinished transactions (only begin transaction, no commit) when the fault occurs, and log the transactions in the (undo) queue to cancel the transactions in the undo queue. The undo processing method is to reverse scan the log file and perform a reverse operation on the update operations of each undo transaction, write the "pre-update value" in the log to the database to redo the transactions in the redo log. The redo processing method is to scan logs forward and re-execute log file registration for each redo transaction. Writing the "updated value" in the log to the innodb_flush_log_at_trx_commit database. The default value 1 indicates that logs must be written to the hard disk (flush) for each transaction commit or non-transaction command. Especially when Battery backed up cache is used. Set to 2 is applicable to many applications, especially the conversion from the MyISAM table. It means writing data to the system cache instead of writing data to the hard disk. Logs are flushed to the hard disk every second. Therefore, updates that exceed 1-2 seconds are generally not lost. Setting 0 is faster, but the security is poor. Even if MySQL fails, the transaction data may be lost. Value 2 can only lose data when the entire operating system is down. Innodb_max_dirty_pages_pct his is an integer in the range from 0 to 100. the default value is 90. the main thread in InnoDB tries to write pages from the buffer pool so that the percentage of dirty (not yet written) pages will not exceed this value. 4. ACID Atomicity (Atomicity): all operations in a transaction are completed or not performed. If an error occurs during execution, roll back to the state before the start of the transaction, like the Durability of the transaction that has never been executed ): once the transaction is completed, all modifications made to the database by the transaction will be permanently saved to the database. To ensure durability, the database system will completely record the modified data to the persistent storage. 5. Log buffer logs are also cached in the memory. Here it is called log buffer. The log file on the disk is called a log file. Log file is usually Append content and can be considered as sequential write. The IO overhead of sequential write is smaller than that of random write. When the data block in the buffer cache is modified, the server process generates redo data and writes it to the redo log buffer. When the following conditions are met, LGWR will write the entries in the redo log buffer into the online redo log: redo log buffer is 1/3 full and times out every 3 seconds (when the data in Timeout log_buffer reaches 1 M transaction commit, if our system has a fast processor and a disk with relatively slow I/O, the processor may fill up the rest of the cache, which will prompt LGWR to move part of the data in the buffer to the disk. In this case, a large log buffer can temporarily mask the impact of slow disks on the system. You can choose to improve the checkpoint or archiving process to improve LGWR performance (maybe you can store all online redo logs to faster bare devices) the redo log buffer can be used for batch submission during batch operations, so that LGWR can write redo entries to online redo log files more efficiently. When a large amount of data is loaded, setting the Log Buffer redo log buffer using the nologging operation is determined by the initialization parameter LOG_BUFFER. to modify this parameter, restart the instance. The appropriate redo log buffer parameter value can significantly improve the system throughput, especially for systems with large data volumes such as insertion, update, and deletion. Default log buffer value: MAX (0.5 M, (128 K * number of cpus). Generally, the default value is sufficient. Increasing the value of log buffer does not negatively affect system performance or recoverability, but only uses additional memory. Generally, log_buffer is enough at 3-5 MB. Exceeding 3-5 M is just a waste of memory. Of course, it is too small and may affect performance. Today, when the memory is not very expensive, and if you have a large number of "large transactions", set log_buffer to 5 MB. 2. Undo logUndo log is an important part of InnoDB MVCC transaction features. When we change the record, an undo record will be generated. The Undo record is recorded in the system tablespace (ibdata) by default, but starting from 5.6, you can also use an independent Undo tablespace Undo record to store the old version of data. When an old transaction needs to read data, in order to read the old version of data, you need to follow the undo chain to find the record that satisfies its visibility. When the version chain is very long, it is generally considered a time-consuming operation (for example, bug #69812 ). Most data change operations include INSERT, DELETE, and UPDATE. The INSERT operation is only visible to the current transaction before the transaction is committed, therefore, the Undo logs generated can be directly deleted after the transaction is committed (who needs to have visibility into the data just inserted !!), For UPDATE/DELETE operations, you need to maintain multi-version information. In InnoDB, The Undo logs generated by the UPDATE and DELETE operations are classified into one type, namely, update_undo. The principle of Undo Log is very simple. In order to satisfy the atomicity of transactions, data is first backed up to one place before any data is operated (this storage data backup is called Undo Log ). Then modify the data. If an error occurs or the user executes the ROLLBACK statement, the system can use the backup in the Undo Log to restore the data to the status before the start of the transaction. In addition to ensuring the atomicity of transactions, Undo logs can also be used to assist in transaction persistence. The Undo Log is used to implement the atomicity of transactions. In the InnoDB Storage engine of MySQL database, the Undo Log is also used to implement multi-version concurrency control (MVCC ). The process of simplifying atomic and persistent transactions using Undo Log is assumed that there are two data types, value 1 and 2, respectively. A. the transaction starts. b. record A = 1 to undo log. c. modify A = 3. d. record B = 2 to undo log. e. modify B = 4. f. write the undo log to the disk. G. Write Data to the disk. H. here is an implicit precondition for transaction commit: 'The data is read to the memory first, then the data in the memory is modified, and then the data is written back to the disk'. The reason why the atomicity and persistence can be ensured at the same time is that, a. record the Undo log before updating data. B. To ensure durability, data must be written to the disk before the transaction is committed. Data must be persistent as long as the transaction is successfully committed. C. The Undo log must be persistent to the disk prior to data. If the system crashes between G and H, the undo log is complete and you can roll back and forth the transaction. D. If the system crashes between A-F because the data is not persisted to the disk. Therefore, the data on the disk remains in the State before the transaction starts. Defect: before each transaction is committed, data and Undo Log are written to the disk, which leads to a large number of disk IO, so the performance is very low. If the data can be cached for a period of time, i/O can be reduced to improve performance. However, the transaction persistence will be lost. 3. The transaction Log of Redo log Innodb refers to the Redo log, which is saved in the Log file ib_logfile *. The redo log can be configured into multiple files by using the innodb_log_files_in_group parameter, another parameter, innodb_log_file_size, indicates the size of each file. Therefore, the total redo log size is innodb_log_files_in_group * innodb_log_file_size. The Redo log file is named ib_logfile [number]. The log directory can be controlled by the innodb_log_group_home_dir parameter. Redo logs are written to the file in sequence. When the file is full, it is traced back to the first file for overwrite. (However, when performing redo checkpoint, it will also update the header checkpoint mark of the first log file, so strictly speaking, it is not sequential writing) the Redo log file is written cyclically, before overwriting, make sure that the corresponding dirty pages have been flushed to the disk. Under a very large load, Redo logs may generate very fast speeds, resulting in frequent dirty operations, leading to performance degradation, generally, after the log with no checkpoint exceeds 76% of the total file size, InnoDB deems this may be an insecure point and will forcibly preflush dirty pages, resulting in a large number of user threads stall. If this is expected, we recommend that you increase the size of the redo log file. You can perform a clean shutdown, modify the Redo log configuration, and restart the instance. Contrary to Undo Log, Redo Log records the backup of new data. Before a transaction is committed, you only need to persist the Redo Log and do not need to persist the data. When the system crashes, the Redo Log is persistent even though the data is not persistent. The system can recover all data to the latest status based on the Redo Log Content. The simplified process of Undo + Redo transactions is assumed that there are two data types: A and B, with values 1, 2, respectively. a. the transaction starts. b. record A = 1 to undo log. c. modify A = 3. d. record A = 3 to redo log. e. record B = 2 to undo log. f. modify B = 4. g. record B = 4 to redo log. h. write the redo log to the disk. I. Features of Undo + Redo transactions committed by transactions A. To ensure durability, the Redo Log must be persistent before the transaction is committed. B. Data is cached in the memory instead of being written to the disk before the transaction is committed. C. Redo Log ensures transaction persistence. D. Undo Log ensures the atomicity of transactions. E. There is an implicit feature that data must be written later than redo log to persistent storage. Iv. IO performance the Undo + Redo Design focuses on improving IO performance. Although data is cached to reduce the I/O of Data Writing, new I/O is introduced, that is, the I/O of Redo Log writing. If the IO performance of the Redo Log is poor, it cannot improve the performance. To ensure good IO performance of Redo logs, the Redo Log Design of InnoDB has the following features: A. Try to keep Redo logs stored in A continuous space. Therefore, when the system starts for the first time, the space of the log file is fully allocated. Redo logs are recorded in sequential append mode, and performance is improved through sequential IO. B. Write logs in batches. Logs are not directly written to files, but are first written to redo log buffer. when the log needs to be refreshed to the disk (such as transaction commit), many logs are written to the disk together. c. concurrent transactions share the storage space of Redo logs. Their Redo logs are recorded in the order of statement execution in turn to reduce the space occupied by logs. For example, the record content in the Redo Log may be as follows: Record 1: <trx1, insert…> Record 2: <trx2, update...> Record 3: <trx1, delete…> Record 4: <trx3, update...> Record 5: <trx2, insert...> D. Because of C, when a transaction writes the Redo Log to the disk, it also writes the logs of other uncommitted transactions to the disk. E. Only sequential append operations are performed on the Redo Log. When a transaction needs to be rolled back, its Redo Log records will not be deleted from the Redo Log. 5. Recovery (recovery) It is essential to see the log in the Recovery: When a database exception occurs, the data can be recovered. Recovery is automatically performed for exceptions that do not damage the disk drive. InnoDB reads the latest checkpoint log records and checks whether dirty pages are written to the disk before an exception occurs. If not, it reads the log records that affect the page and applies them. This is called "rolling forward ". Because there is LSN, InnoDB only needs to compare this number for synchronization. As mentioned earlier in the recovery policy, uncommitted transactions and rolled back transactions also record the Redo Log. Therefore, special processing is required for these transactions during recovery. there are two different recovery policies:. only committed transactions are redone during restoration. B. During recovery, redo all the transactions, including uncommitted transactions and rolled back transactions. Then, roll back the uncommitted transactions through the Undo Log. When you use the UPDATE, INSERT, and DELETE statements to UPDATE data, you change the data in two places: log buffer and data buffers. Buffers is a memory block with a fixed length, usually 512 bytes. Log buffer data buffer ==================================== Log Record #1 = Page Header = Log Record #2 = Data Row = Log Record #3 = Data Row = Log Record #4 = Data Row = ======================================== example: after the insert into jobs values (, 3) Statement is executed, log buffer adds a new log Record called Log Record #5, which contains a rowid and New Record Content. At the same time, data buffer will also add a new line, but it will also identify in the page header: The latest log Record on this page is Log Record #5. In this example, #5 is the Log Sequence Number (LSN), which is crucial for the Sequence arrangement of subsequent operations. The following are some details about data-change: 1. An INSERT log record contains only one new data, which is sufficient for performing re-operations on the page. Therefore, it is called a redo entry. 2. The LSN is not a log record field. It is the relative offset of an absolute address in the file. After InnoDB changes log buffer and data buffer, the next step is to write the disk. This is complicated. Multiple Threads monitor the activity of the buffer. Three conditions are available: overflow, checkpoint, and commit, which can lead to disk write operations. In the case of Overflows, Overflow rarely occurs, because InnoDB uses pro-active measures to prevent buffers from being filled up. But let's take a look at the following two cases: 1. If the log buffer is full, InnoDB writes the log at the end of the buffer. In this case, the following figure is the same (log buffer only has four records, and now the fifth record is inserted): log file (S) before writing log record #5 ======================== Log Record #1 = Log Record #2 = Log Record #3 = Log Record #4 ========================= log file (S) after writing log record #5 ======================== Log Record #5 = Log Record #2 = Log Record #3 = Log Record #4 =========================== logs cannot grow forever. Even if InnoDB uses some compression algorithms, log files are too large to be stored on any disk drive. Therefore, InnoDB adopts the circular writing method, that is, it will overwrite the previous log records. 2. If the data buffer is full, InnoDB writes the recently used buffer to the database, but it cannot be fast enough. In this case, the LSN of the page header takes effect. InnoDB checks whether the LSN of the page header is larger than the LSN of the recent log record in the log file. Only when the log catch up with the data, to write data to the disk. In other words, data pages do not write disks until the corresponding log records need to be written to the disk. This is the log writing policy. What happens to the CheckPoint mechanism? How many pages are refreshed each time, where are the dirty pages retrieved, and when is the refresh triggered? These are complex. There are two types of checkpoints: Sharp Checkpoint and Fuzzy Checkpoint Sharp Checkpoint. When the database is closed, fl all dirty pages back to the disk. Fuzzy Checkpoint is used to refresh some dirty pages during runtime. Some dirty pages are refreshed in the following ways: a. Master Thread Checkpoint Master Thread Refreshes A certain percentage of pages back to the disk from the dirty page list of the buffer pool every second or every 10 seconds. This process is asynchronous and does not block the query thread. B. FLUSH_LRU_LIST Checkpoint InnoDB should ensure that there are about 100 idle pages in the LRU list. Before InnoDB1.1.X, check whether there are enough pages in LRU for the user to query the operation thread. If not, the pages at the end of the LRU list will be eliminated, if there are dirty pages in the obsolete pages, the Checkpoint will be forcibly executed to refresh the dirty pages to the disk. This will obviously block the user query thread. From InnoDB1.2.X, this check is placed in a separate Page Cleaner Thread. You can use innodb_lru_scan_depth to control the number of available pages in the LRU list. The default value is 1024. C. Async/Sync Flush Checkpoint indicates that when redo log files are unavailable, some pages in the dirty page list must be forcibly refreshed back to the disk. This ensures that redo log files can be used cyclically. Before InnoDB1.2.X, Async Flush Checkpoint will block the user query thread that finds the problem, and Sync Flush Checkpoint will block all query threads. After InnoDB1.2.X is placed in a separate Page Cleaner Thread. D. When the number of Dirty pages of the Dirty Page too much Checkpoint is too large, the InnoDB Engine enforces the Checkpoint. The purpose is to ensure that there are enough free pages available in the buffer pool. It can be set through the innodb_max_dirty_pages_pct parameter: mysql> show variables like 'innodb _ max_dirty_pages_pct '; + response + ------- + | Variable_name | Value | + ---------------------------- + ------- + | bytes | 90 | + ------------------------------ + ------- + previously said InnoDB has taken some pro-active measures to ensure that no Overflow, the most important measure is checkpointing. There is a separate thread, or a thread that is separated from a group of threads that modify buffers. At a specific interval, the checkpointer will wake up, check the buffer changes, and ensure that the write disk operation has occurred. Most DBMS will write all the buffer into the disk at this time, which ensures that all the buffers that have changed but have not been written to the disk are written to the disk. That is to say, the DBMS will flush all "dirty" buffers through "Sharp Checkpoint. However, InnoDB only ensures that a, log, and data buffers will not exceed a certain limit point; B and log will always write disks before data; c. The page header LSN of no data buffer is equal to the log record to be overwritten. That is to say, InnoDB is "Fuzzy Checkpoint ". During COMMIT, InnoDB does not write dirty data page to the disk. This is because it is easy to think that submitting a change is to write everything to a persistent media. In fact, only log records need to be written. Writing dirty data pages can only happen at the time of overflow or checkpoint, because their content is redundant.