背景
對於金融級的應用情境,2個副本通常是不夠的,使用者可能會需要多個副本。
例如,一主4從,要求除了主以外,還需要2個同步的副本,其他可以為非同步副本。
另一方面,我們在使用資料庫時,為了擴充讀的能力,讀寫分離是比較常見的用法。
9.6以前的版本,同步複製是確保XLOG已經複製到備庫,而不是已經在備庫apply,雖然APPLY通常都很快,可能也在毫秒層級完成,但是以前沒有apply層級的同步機制。
例如,使用者A往使用者B的賬戶匯了一筆錢,同一時間使用者B在異地馬上要看到A匯過來的錢,這種異地、異庫的讀寫分離情境,就需要主備事務級的嚴格一致,因此9.6的remote_apply就非常有效。
還有一種情境,在主備切換時,要求備庫快速啟用成和主庫一樣的狀態,也需要remote_apply層級的複製。
以上是PostgreSQL 9.6在流複製方面的增強。
同步多副本配置方法
參數配置如下
synchronous_standby_names (string)
支援兩種寫法
num_sync ( standby_name [, ...] )
standby_name [, ...]
num_sync 數字,表示同步的副本數。
沒有寫num_sync的格式相容以前的配置,表示1個副本。
standby_name是standby配置在recovery.conf中的application_name,可以使用*代替所有standby。
例子
2 (s1, s2, s3, s4)
表示2個副本為同步standby,其他為非同步standby。
事物提交時,確保XLOG已同步到任意2個副本(前面兩個是同步節點,後面2個是替補,當前面兩個中任意的節點出問題,後面的會依次替補上)。
手冊說明
Specifies a list of standby servers that can support synchronous replication, as described in Section 26.2.8.
There will be one or more active synchronous standbys;
transactions waiting for commit will be allowed to proceed after these standby servers confirm receipt of their data.
The synchronous standbys will be those whose names appear earlier in this list, and that are both currently connected and streaming data in real-time
(as shown by a state of streaming in the pg_stat_replication view).
Other standby servers appearing later in this list represent potential synchronous standbys.
If any of the current synchronous standbys disconnects for whatever reason, it will be replaced immediately with the next-highest-priority standby.
Specifying more than one standby name can allow very high availability.
This parameter specifies a list of standby servers using either of the following syntaxes:
num_sync ( standby_name [, ...] )
standby_name [, ...]
where num_sync is the number of synchronous standbys that transactions need to wait for replies from, and standby_name is the name of a standby server.
For example, a setting of 3 (s1, s2, s3, s4) makes transaction commits wait until their WAL records are received by three higher-priority standbys chosen from standby servers s1, s2, s3 and s4.
The second syntax was used before PostgreSQL version 9.6 and is still supported.
It's the same as the first syntax with num_sync equal to 1.
For example, 1 (s1, s2) and s1, s2 have the same meaning:
either s1 or s2 is chosen as a synchronous standby.
The name of a standby server for this purpose is the application_name setting of the standby, as set in the primary_conninfo of the standby's WAL receiver.
There is no mechanism to enforce uniqueness.
In case of duplicates one of the matching standbys will be considered as higher priority, though exactly which one is indeterminate.
The special entry * matches any application_name, including the default application name of walreceiver.
Note: Each standby_name should have the form of a valid SQL identifier, unless it is *.
You can use double-quoting if necessary.
But note that standby_names are compared to standby application names case-insensitively, whether double-quoted or not.
If no synchronous standby names are specified here, then synchronous replication is not enabled and transaction commits will not wait for replication.
This is the default configuration.
Even when synchronous replication is enabled, individual transactions can be configured not to wait for replication by setting the synchronous_commit parameter to local or off.
注意standby_name只是指定了同步的standby節點,而事務層級還有一個開關為synchronous_commit,通過配置synchronous_commit用於控制事務的同步層級,可以是非同步,也可以是同步的。
對於可靠性要求高的,可以配置為同步,否則可以為了效能配置為非同步。
事務同步層級說明
通過參數synchronous_commit (enum)配置事務的同步層級。
如果要配置事務或者會話層級的參數,通過以下文法
SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' | DEFAULT }
也就是說,使用者可以根據實際的業務需求,對不同的事務,設定不同的同步層級。
目前支援的同步層級如下
事務提交或復原時,會產生一筆事務結束的commit/rollback redo record,在REDO的地址系統中,用LSN表示。
on,
事務commit或rollback時,等待其redo在primary、以及同步standby(s)已持久化(>=其lsn)。
remote_apply,
事務commit或rollback時,等待其redo在primary、以及同步standby(s)已持久化(>=其lsn)。 並且其redo在同步standby(s)已apply(>=其lsn)。
remote_write,
事務commit或rollback時,等待其redo在primary已持久化; 其redo在同步standby(s)已調用write介面(寫到OS, 但是還沒有調用持久化介面如fsync)(>=其lsn)。
local,
事務commit或rollback時,等待其redo在primary已持久化;
off
事務commit或rollback時,等待其redo在primary已寫入wal buffer,不需要等待其持久化;
手冊說明
synchronous_commit (enum)
Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a "success" indication to the client.
Valid values are on, remote_apply, remote_write, local, and off. The default, and safe, setting is on.
When off, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash.
(The maximum delay is three times wal_writer_delay.)
Unlike fsync, setting this parameter to off does not create any risk of database inconsistency:
an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly.
So, turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction.
For more discussion see Section 30.3.
If synchronous_standby_names is non-empty, this parameter also controls whether or not transaction commits will wait for their WAL records to be replicated to the standby server(s).
When set to on, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and flushed it to disk.
This ensures the transaction will not be lost unless both the primary and all synchronous standbys suffer corruption of their database storage.
When set to remote_apply, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and applied it,
so that it has become visible to queries on the standby(s).
When set to remote_write, commits will wait until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and written it out to their operating system.
This setting is sufficient to ensure data preservation even if a standby instance of PostgreSQL were to crash,
but not if the standby suffers an operating-system-level crash, since the data has not necessarily reached stable storage on the standby.
Finally, the setting local causes commits to wait for local flush to disk, but not for replication.
This is not usually desirable when synchronous replication is in use, but is provided for completeness.
If synchronous_standby_names is empty, the settings on, remote_apply, remote_write and local all provide the same synchronization level:
transaction commits only wait for local flush to disk.
This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits.
It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously.
For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF within the transaction.
remote_apply事務同步層級配置方法
通過參數synchronous_commit (enum)配置事務的同步層級。
如果要配置事務或者會話層級的參數,通過以下文法
SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' | DEFAULT }
小結
1. 可靠性
結合synchronous_standby_names與synchronous_commit的設定,可以實現非常靈活的事務層級可靠性的要求。
2. 讀寫分離的讀一致性保證
當synchronous_commit=remote_apply時,這個事務產生的REDO必須要在sync standby(s)已apply,才返回給用戶端事務結束。
這類事務,事務結束後發起的讀請求,在primary以及sync standby可以做到讀寫完全一致。
犧牲寫事務的RT,獲得在多借點的讀事務的完全一致。
3. 切換時延
當設定了synchronous_commit=remote_apply事務同步複製層級時,可以降低切換延遲。
備庫在啟用時,不需要apply redo,可以迅速的啟用成為主庫。