PostgreSQL replication Series translated from PostgreSQL replication book
In this chapter, you'll look at different replication concepts, and you'll see which types of replication are most appropriate for which practical scenarios. At the end of this chapter, you will be able to determine whether a concept is feasible in a variety of situations.
We will cover the following topics in this chapter:
cap Theory
• Physical limitations of replication
• Why delay has an impact
• Synchronous and asynchronous replication
• Split and copy
Before we actually work with PostgreSQL, we'll walk you through some very basic ideas and facts about replication.
1.1 Cap Theory and physical limitations
You may ask why a theory can be found in such a prominent place as a book, which is considered very practical. Well, there's a very simple reason: Some business database vendors are looking for some good-looking market solutions left to your impression that everything is possible, there is no serious constraints, easy to do. This is not true; Each software vendor must address these limitations. There is no way to circumvent the laws of nature, and the market cannot overcome it.
In this chapter, you will be introduced to the so-called cap theory. It is essential to understand the basic ideas of this theory in the context of solving some needs that cannot be translated into reality.
1.1.1 Understanding Cap Theory
Before we go into the details, we have to discuss the actual meaning of the cap. The CAP is an abbreviation for three features:
Consistency (consistency): This feature indicates whether all nodes in the cluster see the same data at the same time.
Availability (availability): This feature indicates whether all requests can receive a return result. Can the user consider that all nodes in a cluster are available? Splits data or status information between two machines. Make a request, machine 1 has a part of the data, Machine 2 has the remainder of the data. If any one machine fails, not all requests can be fulfilled because not all of the data or status information is available on any single machine.
Partition tolerance (partition tolerance): This feature indicates whether the system can continue to work if any message is lost during the operation. A network partition event occurs when a system is no longer accepting access (considering a network connection failure). One of the different ways to consider partitioning tolerance is to pass it as a message. If a dedicated system can no longer send/receive messages to/from other systems, it has been effectively removed from the network.
Why is the first three points related to the average user? The bad news is that a replication (or distribution) system can only provide 2/3 characteristics at the same time.
In theory it is impossible to provide consistency, availability, and partitioning tolerance at the same time. As you will see later in this book, this has a significant impact on the security of the system and the layout available. There is no such thing as simple as resolving all replication related issues. When you plan a large-scale system, you may want to come up with different concepts to meet your needs, depending on your needs.
[POSTGRESQL,ORACLE,DB2, etc. will provide you with a cap, and a nosql system, such as MongoDB or Cassandra, will provide you with a cap. That's why NoSQL is often referred to as eventual consistency. ]
1.1.2 Why is the speed of light important
The speed of light is not only a theoretical problem, it does have an impact on your daily life. More importantly, it has important implications for you to find the solution that is right for your cluster.
We all know that the ultimate speed of the universe is the velocity of light. So, why do you care? Well, let's do a simple mental experiment. Let's assume that our database server is running at 3GHz clock speed.
How far can the light go in a clock cycle of your CPU? If you calculate, you will find that the distance of light in each clock cycle (pure vacuum) is 10 centimeters. We can assume that the electrical signals in the CPU are slower than the amount of light in the vacuum. The core idea is: a clock cycle of 10 centimeters? Not much at all.
For our psychological experiments, now let's consider different distances:
• Distance between two CPUs
• The distance from your server to the other servers next door
• Your distance from servers in Central Europe to China servers
Given the size of a CPU core on the board, you can assume that a signal is sent from one part of the CPU (although he does not have the speed of light) to the other parts quite quickly. Calculate the two data already in the first-level cache of your CPU and it won't take 1 million clock cycles at all.
But what happens if you have to send a signal from one server to another server? You can safely assume that sending a signal from server A to the next-door Server B takes a long time because the cable is relatively long. Normally, it is more than 10 centimeters. In addition, network switches and other network components will incur additional latency. (The length of the cable here is not the bandwidth of the cable)
[Here I'm talking about the length of the cable, not about its bandwidth. ]
Sending a message (or thing) from Europe to China usually takes more time than sending some data to the next-door server. Again, it is important that the amount of data here is not correlated with the so-called delay time.
1.1.3 Remote Transmission
Let me try to explain the concept of delay by giving a very simple example. Suppose you are a European, you are sending a letter to China. You will easily accept the fact that the size of your letter is not the limiting factor here. Whether your letter is two or 20 pages, there is no difference; the time it takes to get to your destination is basically the same. In addition, if you send a letter at the same time, two letters, 10 letters are no different. Given the number of reasonable letters, the size of the aircraft (bandwidth) that transports things to China is usually not a problem. However, the so-called round trip is likely to be a problem. If you rely on a response from a Chinese letter to continue your work, you will soon find yourself waiting for a long time.
1.1.4 Why delay has an impact
The same concept applies to replication (replication): If you send data blocks from Europe to China, you should avoid waiting for a response. If two data blocks are sent between servers on the same rack, you may be able to wait for a response, because your electrical signal will be fast enough to make it return.
[The basic issue of the delay described in this section is not specific to PostgreSQL.] The same conceptual and physical limitations apply to all types of databases and systems. As before, this fact is sometimes quietly hidden and overlooked in shiny business marketing papers. However, the laws of physics are constant. This applies to both commercial and open source software. ]
Most importantly, you must remember that in a replication environment, bandwidth does not always resolve performance issues. In many settings, latency is at least as important as bandwidth.
The first chapter of PostgreSQL Replication Understanding Replication Concepts (1)