The log implementation mechanism of PostgreSQL

Source: Internet
Author: User

1, the concept of business
A transaction is a concept that introduces a database from real life, that is, operations within a transaction, either wholly or completely. Just like a bank transfer, when a portion of the money is transferred from one account, the same amount of money must be deposited in another account, and if the transaction is aborted after the transfer, and no deposit is made in the other account, then the money is gone, and that is the atomic nature of the transaction. When the transaction is complete, the result must be recorded, otherwise it is impossible to know whether the transaction has occurred or has not yet occurred, which is the persistence of the transaction. In addition, transactions are isolated and consistent.


2, why should I introduce a log?
First, let's look at how a transaction is implemented in a database. When the transaction starts, we read the data from the disk, and then we manipulate the data, it may be filtering, statistics, updates, and so on, and some new data, in short, if the data changes, after the data has been completed, these changes must be re-written to disk, so that we completed a transaction. This is, of course, the simplest of a description, and let's take a closer look at each link. The first is to read the data from the disk, according to common sense, we know that in an application system, we may often read the same data, if it is read from disk every time, because the disk IO is slow, so inefficient, poor performance. As you can imagine, you can use a buffer mechanism to improve the performance of data read. The main purpose of this article is not to buffer the number of said. Next is the operation of the data, after the completion of the transaction, we need to write the updated data to disk, where the same problem arises, disk IO performance problems, then some people say we can also use buffer mechanism AH? That's great, the buffer does help us alleviate the disk IO performance problem. But the buffer mechanism helps us solve the disk IO performance problem, but also brings a new problem, what if a failure occurs?
In the design of database system, the loss of data is unacceptable and the log is introduced in order to solve the performance problem of buffer data writing to disk. Before we manipulate the data, we log the operation and then modify the data, and of course it doesn't make sense to change the data's log. We can ensure that all the lost data is recovered by reading the log and re-doing the lost data operation. Some people say that the write log is not the same as the write buffer to write disk? The classmate said too right, really is the same, all to write disk operations, just a little bit of subtle differences, write logs are written sequentially to disk, and buffers are randomly written to disk. Although this is only a little difference, but the impact on the performance is huge, interested students can go to try yo. The amount of data in the log is also much smaller than the amount of data to be written to the buffer.
Some people ask questions, why do you want to log the operation first, and then perform the operation to modify the data? This is because if the operation is performed first, then if the system is down before writing to the log, the operation will be lost and called the Wal (write ahead log) in the database system.


3. Introduction of Log Buffers
To further improve performance, the introduction of the log buffer, the bulk of the log written to disk, instead of producing a write one, which brings a problem, the log buffer before writing to the disk may cause log loss, resulting in data loss. How to solve this problem? We need to further analyze the role of the log in order to redo the lost operation, if a transaction is not committed, then the operation has been performed is not really important, even if the loss has no effect. Just like a bank transfer, from an account has been transferred out of the system failure, unable to transfer to another account, this transaction will be rolled back, that is, the system will be returned to the state before the account is transferred out, the account transfer operation is not valid, even if the account is transferred out of the operation this log is not written to the disk resulting in loss of will not have any effect, may also accelerate the recovery process, less processing a log. Therefore, the disk write time of the log buffer can be postponed, late than the transaction commit. In fact, there are some other restrictions on the implementation of the log buffer, such as checkpoint, the log buffer is full, and not necessarily wait until the transaction commits to write to the disk.


4. The origin and function of LSN
Now that you have the log, you have to play its role in the recovery process, by reading the log to redo the operation, in what order to redo the log? It is important to record the order of historical operations, and the consequences are very serious if the sequence of operations is found to be chaotic. For example, a value of 100 minus 100, and then double, if the operation sequence reversal, first doubled and then minus 100, the results are very different. Here we need a rule, to write a serial number of the log, we in the order generated by the log to each log number, and then by the log number to redo the log, there will be no log redo chaos occurs. In the process of implementation, we write to the disk in the order in which the logs were produced, even in the log buffers, in the order in which they were produced, and then in the order in which they were written, and then the log buffers were written to disk. So we can take the log in the log file offset in lieu of this log number, not only do not require additional disk overhead, but also through this offset to quickly locate the log, it is a magical idea, we gave such a log number a special name: LSN, this is the origin of the LSN.
But we also found a new problem, although we know all the historical operations and the order of the relationship between them, but do not know whether the impact of these operations have been saved to disk, if the simple redo all operations, will not have done the operation repeated. Like a shopping transfer and two bucks to go out? So the last time the log number LSN of the operation of this block is modified in the block record of each chunk, when the log is redo, the data block is loaded into the buffer, called the page, if the LSN in the header of the page is smaller than the LSN of the current redo log, then the current log has not been re-made; If it is not smaller than the LSN of the current redo log, which is greater than or equal to the LSN of the current redo log, the current log has been re-made or does not need to be redo;


5. Accelerating the recovery process with checkpoint
When the system fails, due to the existence of the log we do not have to worry about data loss, you can read the log to recover, but if the system has been running for a long time, many operations, the log is very large, the recovery process will be very slow when the log recovery. In the production environment, the shorter the recovery time, the better, how to shorten the recovery time? Checkpoint is the solution to this problem. In the log, introduce a special type of log, checkpoint log, which means that all "dirty data" before it has been written to disk, then the log before it can be ignored during the recovery process, instead of processing. Although we hope that checkpoint is a transient process, but in the implementation of the very difficult, we can not instantaneous all the "dirty data" to disk, if you can do so, there is no need to log. So checkpoint is a process that has its start and end, and when checkpoint starts, we record the current log's record offset LSN and mark all "dirty" to prepare the write state, followed by writing "dirty data" with ready-to-write state to disk. Note: While other processes or threads may produce new "dirty data" while writing, the newly generated "dirty data" we do not care whether it is written to disk or not. When all marked "dirty data" is written to the disk, a checkpoint log is inserted in the log, indicating that checkpoint is complete, and it also records the log offset at the beginning of the checkpoint, also known as the redo offset. When recovering, first find the location of the last checkpoint log, read out the checkpoint log record, get the redo offset from it, and then start the recovery from the redo offset. By adjusting the checkpoint interval, an acceptable recovery time can be obtained.

The log implementation mechanism of PostgreSQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.