Understanding the undo logs, redo logs, and checkpoints in the database

Source: Internet
Author: User

Understanding the undo logs, redo logs, and checkpoints in the database

This document describes a database file that stores data.

The database content is cached in the memory. The name is db buffer. In an operation, we retrieve the data in a table in the database. This data will be cached in the memory for some time. The modification to this data is only the content in the memory at the beginning. When the db buffer is full or in other cases, the data is written to the data file.

Undo, redo

Logs are also cached in the memory. Here they are called log buffer. The log file on the disk is called a log file. Log files are usually appended content and can be considered sequential write. The disk IO overhead of sequential write is smaller than that of random write.

Undo logs record the value before a data modification, which can be used to roll back when a transaction fails. Redo logs record the value after a data block is modified, it can be used to restore the data of successful transaction updates that have not been written to the data file. The following example is slightly changed from Yang Chuanhui's Principle Analysis and architecture practice of the big data distributed storage system.

For example, if the transaction serial number of a transaction is T1 and the transaction number of data X is modified, set the original value of X to 5 and the modified value to 15, then the Undo log is <T1, X, 5>. The Redo log is <T1, X, 15>.

The combination of undo and redo is also called the Undo/Redo log. In this example, the Undo/Redo log is <T1, X, 5, 15>.

When a user generates a database transaction, the undo log buffer records the original value of the modified data, and the redo records the updated value of the modified data.

The redo log should first be persistent on the disk, and then write the transaction operation results to the db buffer. (in this case, the data in the memory is different from the data file, we think that the data in the memory is dirty data), and the db buffer chooses the appropriate time to persist the data to the data file. This sequence ensures that the last modification operation is restored when a fault recovery is required. The policy for persistent logs is Write Ahead Log, that is, pre-Write logs.

In many systems, undo logs are not stored in log files, but stored in a special segment in the database. In this article, all these storage behaviors are generalized to undo logs and stored in the undo log file.

For a transaction T, the start mark (such as "start T") must start in the log file record "), end at the end of the transaction (such as "end T" and "commit T "). When the system recovers, if a transaction in the log file has no transaction end mark, you need to perform the undo operation on the transaction. If there is a transaction end mark, the redo operation is performed.

Before writing the content in the db buffer to the disk database file, you should write the content in the log buffer to the disk log file.

There is a problem: what are the number of transactions stored in redo log buffer and undo log buffer? What are the rules for writing logs to log files? If the number of transactions stored is one, it means that logs are immediately flushed into the disk, so data consistency is guaranteed. A sudden power failure occurs when task T is executed. If no append operation is performed on the redo log file on the disk, the transaction T can be regarded as unsuccessful. If the redo log file is modified, the transaction is considered successful. Restart the database and use the redo log to restore data to the db buffer and data file.

If you store multiple files, it is actually quite easy to explain. Before db buffer writes data file, the log is first written to log file. This method can reduce disk IO and increase throughput. However, this method is applicable to scenarios with low consistency requirements. If a system fault such as power failure occurs, the completed transactions in log buffer and db buffer will be lost. Take the transfer as an example. If the user's transfer transaction is lost in this case, it means that the user needs to transfer the transaction again after the system recovers.

Checkpoint

Checkpoint is used to regularly refresh the content of the db buffer to the data file. When the memory is insufficient or the db buffer is full, you need to dump the content/part of the db buffer (especially the dirty data) to the data file. During the dump, the "time" of the checkpoint is recorded. When replying to a fault, you only need the operation after the last checkpoint of redo/undo.

Idempotence

Operation Records in log files should be idempotent. Idempotence means that the same operation is executed multiple times and once, and the result is the same. For example, 5*1 = 5*1*1*1, so the multiplication 1 operation on 5 has idempotence. Log files may be replayed multiple times during fault recovery (for example, if the system loses power when half of the log files are replayed for the first time, the log files have to be replayed again). If the operation records do not meet the idempotence, data errors may occur.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.