PostgreSQL Replication's second chapter understands the transaction log of PostgreSQL (4)

Source: Internet
Author: User
Tags postgresql truncated

2.4 Adjusting checkpoints and xlog

So far, this chapter has provided insight into how PostgreSQL writes data, in general, what Xlog is used for. Given this knowledge, we can now continue and learn what we can do to make our databases work more efficiently in both replication and single-server operations.

2.4.1 Understanding Checkpoints

In this chapter, we have seen that it has been written to Xlog before the data can be elsewhere. The problem is that if xlog has never been removed, it is clear that we will not always write to xlog at the same time without filling the disk.

To solve this problem, Xlog must be deleted at some point. This process is called a checkpoint.

The main problem from this question is: When can xlog be truncated to a specific point? The answer is: PostgreSQL puts everything in the xlog into the storage file. If all changes in Xlog are also placed in the data file, the Xlog can be truncated.

[Keep in mind that writing data is worthless and we must also refresh the data into a data table.] ]

In a way, Xlog can be seen in the case of a data file repairman when something happens. If everything is completely repaired, the repair instructions can be safely removed. This is what happens during a checkpoint.

2.4.2 Configuring checkpoints

Checkpoints are very good for consistency, but they are also very important for performance. If the checkpoint is misconfigured, you may face severe performance degradation.

When it comes to configuring checkpoints, here are some related parameters. Note that all of these parameters can be changed in postgresql.conf:

Checkpoint_segments = 3

Checkpoint_timeout = 5min

Checkpoint_completion_target = 0.5

checkpoint_warning = 30s

In the following sections, we'll look at these variables:

About segments and timeouts

Checkpoint_segments and Checkpoint_timeout will define the distance between the two checkpoints. Checkpoints occur or when we run out of paragraphs or actually arrive.

Keep in mind that a segment is usually 16MB, so three segments means we will do a checkpoint every 48MB. On modern hardware, 16MB is far from enough. In a typical production system, a checkpoint of 256 intervals or higher is perfectly feasible.

However, when setting up checkpoint_segments, one thing must be remembered in your mind: In the event of a crash, PostgreSQL must replay all changes since the last checkpoint. If the distance between the two checkpoints is very large, you may notice that your failed DB instance takes a long time to start again. For usability, this situation should be avoided.

[After a crash, there is always a tradeoff between performance and recovery time; You must balance your configuration accordingly. ]

Checkpoint_timeout is also very important. It is the allowed time between two checkpoints on-line. It doesn't make sense to add checkpoint_segments without changing the time without restriction. In large systems, for many people, increasing checkpoint_timeout has proved to be meaningful.

[In PostgreSQL, you will find that there is a constant for the transaction log.] Unlike in other database systems, the number of xlog files does not have anything to do with the maximum size of a transaction; the size of a transaction can easily exceed the distance between two checkpoints. ]

Write or not write?

We already know in this chapter that at commit we are not sure whether the data is already in the data file.

So, if the data files do not have to be consistent, why not change the point in time when the data is written? That's what we can do with Checkpoint_completion_target. The idea is to have a target that specifies the checkpoint to complete as part of the total time between two checkpoints.

Now let's discuss three scenarios to illustrate the purpose of Checkpoint_completion_target:

Scenario 1-Store stock market data

In this scenario, we will store the latest quotes from all Dow Jones Industrial (DJIA) stocks. We don't want to store all the stock price history, but the recent, current price.

Given the type of data we are working on, we can assume that we have a workload that is determined by the UPDATE statement.

What's going to happen? PostgreSQL must update the same data over and over again. Given the fact that DJIA consists of 30 different stocks, the amount of data is very limited, and our tables are very small. In addition, prices may be updated once per second, or more frequently.

Internally, this is the case: when the first update arrives, PostgreSQL obtains a block, puts it in memory, and modifies it. Each subsequent update will most likely change the same block. Logically, all write operations must be written to the transaction log, but what happens to the cache block in the shared buffer? The general rule is that if there are many update (changes to the same block, respectively), it is advisable to keep the blocks in memory as much as possible, which greatly increases the likelihood of avoiding I/O by writing multiple change changes.

[If you want to increase the ability to have many changes in a disk I/O, consider lowering the checkpoint_complection_target. Blocks will stay in memory for a longer period of time, so many changes may enter the same block before writing occurs.]

The scheme is just an introduction, and a checkpoint_completion_target of 0.05 (or 5%) may be justified.

Scenario 2-Bulk load

In our second scenario, we will load 1TB of data into an empty table. If you are loading so much data at once, what is the likelihood of hitting the data block you hit 10 minutes ago again? This probability is basically 0, in which case the buffer does not have a bit of writing because it is easy to miss the capacity of the disk by idle and waiting for I/O to occur.

During bulk loading, we will use all the I/O capabilities we have always had. To ensure that PostgreSQL immediately writes out the data, we must increase the value of the Checkpoint_completion_target to close to 1.

Scenario 3-I/O Peak and throughput considerations

Sharp Janus can kill you, at least they can cause serious damage that should be avoided. Real things in the real world around you are also true in the database world.

In this scenario, we're going to assume that an application stores a call detail record (CDRs) called the phone company. As you can imagine, a lot of writing will happen and people are on the phone all day. Of course, there will be people on the phone, there is another phone immediately followed, but we will also witness a lot of people in a week to call only once.

Technically, this means there is a good chance that in the shared memory block, it has recently been changed and will face a second or third change, but we will also make huge changes to those blocks that are no longer being accessed.

What should we do with this situation? Write the data later so that as many changes as possible will be changed on the page that was previously modified. But what happens during checkpoints? If changes (in this case, dirty pages) have been hoarding for too long, the checkpoint itself will be intense and many blocks must be written in a very short period of time. This can lead to so-called I/O Janus. In an I/O Janus, you will see that your system is busy. This may indicate a lack of response time, and a lack of response time can be felt by your end users.

This adds a dimension to the problem: predictable response times.

Let's say this: Let's assume you've been successful in using online banking for a while. You're very happy. Now, some people in your online banking find a way to adjust, which makes the database behind the online banking 50% faster, but this adds a disadvantage: two hours a day, the system is unreachable. Obviously, the throughput will be better from a performance standpoint:

Hours * 1 x < hours * 1.5 x

But will your customers be happy? Obviously, you won't. This is a typical user case, and there is no benefit in optimizing the maximum throughput. If you can meet your performance needs, having a uniform response time and a little bit of performance loss can be a cost-wise. In the case of our bank, this will mean that your system is running 24x7 instead of running only 22 hours per day.

[If your Internet bandwidth is 10 times times faster than before, will you pay your burdens frequently? Obviously, you won't. Sometimes it is not about how many transactions per second are optimized, but rather the optimization method that you handle a pre-defined load in the most reasonable way. ]

The same concept applies to our telephony applications. We write all the changes during the checkpoint, because this can cause latency issues during checkpoints. This is not good for changing data files immediately (meaning: High checkpoint_completion_target), because we will write too much, too often.

This is a typical example of a compromise you must have, checkpoint_completion_target for 0.5 may be the best note in this case.


The conclusion from three examples is that there is no configuration suitable for all purposes. To get a good, workable configuration, you really need to think about the type of data you're working on. For many applications, a value of 0.5 has proved just right.

2.4.3 adjusting the Wal buffer

In this chapter, we have adjusted some important parameters, such as Shared_buffers, Fsync, and so on. There is one more parameter, however, which can have a significant impact on performance. The parameter wal_buffers is designed to tell PostgreSQL how much memory is being used to record xlog that are not currently written to the disk. So, if someone is injecting a large transaction, PostgreSQL will not write any smaller table to Xlog before commit. Keep in mind that during a crash, if an uncommitted transaction is lost, we don't need to care about it, because commit is the only thing that matters in everyday life. This is significant for using large block write Xlog before commit occurs. This is exactly what Wal_buffers did: unless manually changed in postgresql.conf, it is an auto-tuning parameter (denoted by 1), which allows PostgreSQL to spend 3% shared_buffers before Xlog writes back to the disk. But not more than 16MB to keep xlog.

[in the old version of PostgreSQL, this parameter is 64KB. It is unreasonable to have such a low value for modern machines. If you are running an older version of PostgreSQL, consider adding wal_buffers to 16MB. For a reasonably sized database instance, this is usually a reasonable value. ]

PostgreSQL Replication's second chapter understands the transaction log of PostgreSQL (4)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.