1.2 Different types of replication
Now that you have completely understood the limitations of physics and theory, you can start learning different types of replication.
1.2.1 Synchronous and asynchronous replication
The first distinction we can make is the difference between synchronous replication and asynchronous replication.
What does that mean? Suppose we have two servers and want to replicate data from one server (the master) to the second server (the slave). Describes the concepts of synchronous and asynchronous replication:
We can use a simple transaction as follows:
BEGIN:
INSERT into foo VALUES (' Bar ');
COMMIT;
In the case of asynchronous replication, the data cannot be replicated until the transaction is committed to master. In other words, slave never get ahead of master, and in the case of write operations, it usually lags behind the master. This delay is called hysteresis (lag).
Synchronous replication enforces a high consistency rule. If you decide to use synchronous replication (how this is actually discussed in the fifth chapter, to establish synchronous stream replication), the system must ensure that the data written by the transaction is committed at least one transaction at the same time on both servers. This means: Slave does not lag behind master, and the data that the end user sees on both servers is consistent.
[Some systems will also use the quorum server to decide.] Therefore, it is not always about only two or more servers. If an arbiter server is used, more than half of the servers must agree to actions within the cluster. ]
Understanding replication and data loss
When a transaction is copied from master to slave, a lot of things have to be considered, especially when it comes to things like data loss.
Suppose we are replicating data asynchronously in the following ways:
- The thing is sent to master.
- Things are submitted to master.
- Master goes down before things are sent to slave.
- Slave will never receive this transaction.
In the case of asynchronous replication, there is a window (lag) in which data is lost during a lag window. The size of the lag window varies depending on the setting type. Its size is very short (a few milliseconds) or very long (minutes, hours, days). One important fact is that the data may be lost. A small lag is less likely to be a data loss, but any lag greater than 0 can easily lead to data loss.
If you want to make sure that data is never lost, you must switch to synchronous replication. As you have seen in this section, a synchronous transaction is synchronous because it is valid if the thing is committed to two servers.
Consider performance issues
As you can see in the section on speed of light and latency, the overhead of sending unnecessary messages over the network is expensive and time-consuming. If a transaction is replicated synchronously, PostgreSQL must ensure that the data reaches the second node, which can cause latency problems.
In many ways, synchronous replication is much more expensive than asynchronous replication, so people should think twice before doing so if the consumption really needs and adjusts. (Use synchronous replication only when needed)
[Use synchronous replication only when it is really necessary.] ]
1.2.2 Single-master replication and multi-master replication
The second classification method for various replication settings is single-masster replication and multi-master replication.
Single-master means that a write operation can only be sent to a server that allocates data to an internal set of slave. Slave can only receive read operations but will not receive write operations.
With respect to single master replication, multiple master replication allows write operations to be sent to servers within all clusters. Shows how the system works at a conceptual level:
Any node that can be written to the inside of a cluster sounds like an advantage, but it is not an essential advantage. The reason for this is that multiple master replication adds a lot of complexity to the system. In the case of only one master, which data is correct, which direction the data will flow to is very clear, and there are few conflicts during the copy process. Multi-master replication is completely different, and writes can be sent to multiple nodes at the same time, and the cluster must be very aware of conflicts and properly handle them. Using locks to solve this problem is an alternative approach, but this approach produces its own problems.
[Keep in mind that the need to resolve conflicts generates network traffic, and this can instantly become an extensibility issue caused by latency.] ]
1.2.3 logical replication and physical replication
Another way to classify replication is to differentiate between logical replication and physical replication.
The difference is subtle, but very important. Physical replication refers to the system moving data to a remote server.
So, if something is inserted. The remote server gets the binary format of the data, not the SQL.
Logical replication means a change, equivalent to the data being copied.
Let's look at an example to fully understand the difference:
test=# CREATE TABLE t_test (t date);
CREATE TABLE
test=# INSERT into T_test VALUES (now ())
returning *;
T
------------
2013-02-08
(1 row)
INSERT 0 1
I can see there are two transactions performed: The first transaction creates a table. Once this is done, the second transaction adds a simple date to the table and commits.
In the case of logical replication, the change will be sent logically to some sort of queue, so the system does not send ordinary SQL, but it may be something like this:
test=# INSERT into T_test VALUES (' 2013-02-08 ');
INSERT 0 1
Note that the function call has been replaced with the actual value. If slave recalculates the Now () function, it would be a huge disaster because the date on the remote server might be a completely different date.
[Some systems use statement-based replication as the core technology.] For example: MySQL uses a so-called Bin-log to replicate, which is actually not binary, but some form of logical replication. ]
Physical replication will work in a completely different way: instead of sending some SQL (or other), which is logically equivalent, the system sends a binary substitution made within PostgreSQL.
Here are some binary alternatives, our two transactions may be triggered (so far, not a complete list):
- Add a 8k block to Pg_class and insert a new record (indicating that the table is the current state).
- Add row to pg_attribute store column name.
- Perform the various changes within these tables.
- Record submission status, and so on.
The goal of physical replication is to create a copy of the system at the same physical level. This means that the same data on all servers will be in the same place on your table. In the case of logical replication, however, the content should be the same regardless of whether the content is in the same place and no different.
When to use physical replication
Physical replication is easy to use, especially easy to set up. Physical replication is widely used when the goal is to create the same copy of the system (after creating a backup and simply extending it).
In many settings, physical replication is the standard method that exposes the lowest possible complexity to the end user. It is the ideal way to expand the data outward.
When to use logical replication
The setup of logical replication is usually a bit difficult, but it provides greater flexibility. It is also particularly important when it comes to upgrading an existing database. Physical replication is completely inappropriate for version jumping, because you cannot simply rely on the fact that each version of PostgreSQL has the same disk layout. The storage format may change over time, so binary replication is obviously not suitable to jump from one version to another.
Logical replication allows the decoupling of data storage methods and the data transfer and replication methods. By using the neutral protocol, the Protocol is not bound to a specific version of PostgreSQL, and it is easy to jump from one version to another.
The first chapter of PostgreSQL Replication Understanding Replication Concepts (2)