4.7 Conflict Management
In PostgreSQL, stream replication data flows in only one direction. Xlog is provided by master to several slave that consume transaction logs and provide you with a better backup of your data. You may wonder how this can lead to a conflict, which can occur.
Consider the situation: as you know, there is a small delay in data replication. Therefore, Xlog ends in slave after it is generated by master. This slight delay causes the scenario as shown:
Let's say a slave starts reading a table. It is a long read operation. At the same time, master receives a request to actually delete the table. This is a bit of a problem because slave still needs this data to execute its SELECT statement. On the other hand, all requests from Master must be obeyed under any circumstances. This is a classic clash.
[If a conflict occurs, PostgreSQL will issue an error message: The connection was terminated because of a recovery conflict.] ]
There are two options to solve this problem:
1. Do not replay conflicting transaction logs until slave terminates the problematic operation
2. Kill the query on the slave to solve the problem.
The first choice may cause severe delays during playback of the replay process, especially if the slave performs a fairly long operation. The second option may often kill queries on slave. A DB instance cannot know what is best for your application by itself. So you have to find a proper balance between deferred replay and kill query.
To find this delicate balance, PostgreSQL provides two parameters in postgresql.conf:
Max_standby_archive_delay = 30s
# Max delay before canceling queries
# when the reading WAL from archive;
#-1 allows indefinite delay
Max_standby_streaming_delay = 30s
# Max delay before canceling queries
# when reading streaming WAL;
#-1 allows indefinite delay
When there are conflicting actions, the Max_standby_archive_delay parameter tells the system how long it will take to terminate the Xlog replay. In the default settings, if a conflict is found, slave will defer xlog replay for up to 30s. This setting is valid if slave is replaying the transaction log from a file.
If Xlog is entering slave,max_standby_streaming_delay through the stream, it will tell slave how long it will take to terminate the Xlog replay. If the time has expired and the conflict persists, PostgreSQL will cancel the statement because a recovery problem causes problems on the slave system, recovering xlog recovery to catch up.
In the previous example, we have shown that if a table is deleted, the conflict may occur. This is an obvious scenario; however, to date, it is not the most common one. It is more likely that a row is removed by vacuum or hot-update, resulting in a conflict on the slave.
It's annoying to get out of a conflict for a while and cause your app to behave negatively. In other words, conflicts should be avoided if possible. We have seen how the replay log is delayed. These are not the only mechanisms provided by PostgreSQL. There are two more settings that we can use.
The first of the two settings is also an earlier setting that is vacuum_defer_cleanup_age. It is measured in a transaction and tells PostgreSQL when to delete a row of data. Typically, if no more transactions can see the data, a row of data is deleted by vacuum. Vacuum_defer_cleanup_age tells the vacuum not to purge a row of data immediately, but waits for some transactions before the data is purged.
Deferring cleanup will keep a row of data longer than it takes. This helps slave to complete queries that rely on old rows of data. Especially if your slave provides services to handle some analytical work, this will help ensure that no queries are killed in vain.
There is also a way to control conflicts by using Hot_standby_feedback. The idea is that slave reports the transaction ID to master, which in turn can use this information to postpone vacuuming. This is one of the simplest ways to avoid slave on a clean conflict.
[Keep in mind that delayed cleanup can lead to increased space usage and some side effects that must be kept in mind in all cases. The effect is the same as running a long transaction on master. ]
The fourth chapter of PostgreSQL replication set up asynchronous replication (7)