Working with Walbouncer
In the final chapter of this book, you will be guided to a tool that was released in 2014, called Walbouncer. Most of the tips in this book show how to replicate an entire database instance, how to Shard, and so on. In the last chapter, it is about Wabouncer, which is all about filtering the transaction log stream to selectively replicate database objects from one server to a group (not necessarily identical) slave.
This chapter covers the following topics:
The basic concept of Walbouncer
• Installation Walbouncer
• Selectively replicate databases, tables, and table spaces
The Walbouncer tool is available for PostgreSQL 9.4 or later versions.
Walbouncer the concept
The purpose of the PostgreSQL transaction log is to help a database instance that fails in a crash event to recover itself. It can also be used to replicate the entire database instance, as we discussed in the section on synchronous replication and asynchronous replication in this book.
The problem is that replicating the entire DB instance is a must. In many real-world scenarios, this is a problem. Let's assume that there is a central server that contains many university students learning information. Each university should have a copy of the data. As a PostgreSQL9.4, it is not possible to use a DB instance because stream replication has the ability to replicate only one database fully. Running many instances is obviously a lot of work, perhaps not the desired set of methods.
The idea behind Walbouncer is to connect to the PostgreSQL transaction log and come over it. In this scenario, slave will only receive a subset of the data, filtering out all the data for the view that might be the key to the rule or a safe point. In the case of our university, each university will only have its own copy of the database, so there is no way to see data from other organizations. Hiding data is a huge improvement when it comes to security systems. This may be a usage scenario for walbouncer for fragmentation purposes.
Shows how this works:
The Walbouncer tool is a process that is located between master and slave. It connects to master, gets the transaction log, and filters it before it is transferred to slave. In this example, there are two slave that can be used to consume the transaction log, just like a normal slave.
The Walbouncer tool is ideal for geographically distributed database systems because it allows us to easily determine where the data is going and which database needs to be located. A basic block diagram is shown here:
Filter XLOG
Now the core question is: How does the Walbouncer filter logs? Remember, the location of the transaction log is critical, and in many cases it is important to tamper with the location of these transaction logs, which is dangerous and not feasible at all.
The key to solving this problem lies at the heart of PostgreSQL. The core knows how to handle the cloned transaction log entries. Cut all the data destined not to reach a particular server, the cloned xlog is injected to replace the original one. Now slave can safely consume Xlog and ignore those copied records. In fact, the beauty of this technique is that master and slave can remain the same-no patching is needed. All the work can be done entirely by walbouncer, it simply acts as a proxy for some kind of xlog.
The causal relationship of the technology used is as follows:
• Each slave will receive the same number of Xlog records regardless of the amount of data actually changed in the target system
• Metadata (System tables) must be completely copied, never be left behind
• The target system will still see that a particular database should exist, but using it will fail.
The last one deserves particular attention. Remember, the system cannot parse the semantics of a particular xlog record, and all it does is check if it needs it. Therefore, the metadata must be copied. When a slave system tries to read the filtered data, it receives an annoying error that indicates that the data file is missing. If the file cannot be read from disk, an error is displayed and rolled back. This behavior may lead some people to confusion, but this is the only possible way to address potential technical problems.
installation Walbouncer
The Walbouncer tool can be downloaded free of charge from the Cybertec website (http://www.cybertec.at/postgresql_produkte/walbouncer/) and installed using the following steps:
1. For the purposes of this book, the following documents are used:
wget http://cybertec.at/download/walbouncer-0.9.0.tar.bz2
2. The first thing to do is unpack the tar package, as follows:
Tar xvfj walbouncer-0.9.0.tar.bz2
3. Once the package has been unzipped, you can enter the directory. It is important to check for missing libraries before calling make. Ensure that YAML is supported. In my CentOS test system, the following command can do the job:
[email protected] ~]# Yum install Libyaml-devel
4. The library will be installed through these lines:
---> Package libyaml-devel.x86_64 0:0.1.4-11.el7_0 'll be installed
--->processing Dependency:libyaml = 0.1.4-11.el7_0 for package:libyaml-devel-0.1.4-11.el7_0.x86_64
5. Next, only make is called. The code compiles cleanly. Finally, only make install is left:
[[email protected] walbouncer-0.9.0]# make install
CP Walbouncer/usr/local/pgsql/bin/walbouncer
The binary packages required to run Walbouncer will be copied to your PostgreSQL binary directory (in my case,/usr/local/pgsql/).
As you can see, deploying Walbouncer is easy, and all it needs is a few commands.
Configuration Walbouncer
Once the code has been successfully deployed, you must come up with a simple configuration to tell walbouncer what to do. To demonstrate how Walbouncer works, a simple setup program has been created here. In this example, two databases exist in master. Only one of them will eventually be on the slave:
$ createdb A
$ createdb B
The goal is to copy a to slave and skip the other.
It makes sense to do a walbouncer config backup before starting the base backup. A basic configuration is very simple and easy to implement:
listen_port:5433
Master
Host:localhost
port:5432
Configurations:
-Slave1:
Filter
Include_databases: [A]
The configuration consists of components:
Listen_port: This is necessary. It defines which port the walbouncer uses for listening. Slave can connect to this port and transfer the transaction log directly from the bouncer stream.
Master: The next section will tell Walbouncer where to find its master. In our example, Master is on the same host and listens on port 5432. Note that no databases are listed. The system is connected to the Xlog stream, so no database information is required.
configurations: This covers the slave configuration. You can list multiple slave. For each slave, a few filters are available. In this example, only the a database is included and the rest of the databases are filtered out.
Create a base backup
Once the configuration is finished, it is time to clone an initial DB instance. The tricky thing is that there are no tools like pg_basebackup, and it does most of the work for you. The reason is that Pg_basebackup is designed to replicate the entire database instance. In the case of Walbouncer, the idea is that there is only part of the data on the target system. Therefore, the user must go back to the basic method of creating the backup. The method chosen is the traditional way to perform the underlying backup.
However, it is important to prepare the Master for standard stream replication before you begin. This includes:
• Adjust postgresql.conf (wal_level, wal_keep_segments, Max_wal_ senders, etc.)
• Adjust the pg_hba.conf on master
• Set the data directory on slave to chmod 700
All of these steps have been described in chapter fourth, setting up asynchronous replication. As already mentioned, the tricky part is the initial base backup. Assuming that the database must be replicated, you must find its object ID:
test=# SELECT OID, datname from pg_database WHERE datname = ' a ';
OID | Datname
-------+---------
24576 | A
(1 row)
In this example, the object ID is 24576. The general rule is as follows: all databases with OIDs greater than 16383 are created by the user. This is the only method that can be used to filter the database in a useful way.
Now go to the Maser data directory and copy everything except the base directory to the directory where slave resides. In this example, this trick is used: There is no file name starting with a, so it is possible to safely copy everything starting with C and add the backup label to the replication process. Now, the system will replicate everything except the underlying directory:
CP-RV [c-z]* Backup_label. /slave/
Once everything has been copied to slave, in this case this happens on a single server, the missing base directory can be created in the slave directory:
$ mkdir.. /slave/base
In the next step, all the required databases can be replicated. At this time, the following is the case on the sample master:
[Email protected] base]$ ls-l
Total 72
DRWX------2 HS HS 8192 FEB 24 11:53 1
DRWX------2 HS HS 8192 FEB 24 11:52 13051
DRWX------2 HS HS 8192 FEB 25 11:49 13056
DRWX------2 HS HS 8192 FEB 25 11:48 16384
DRWX------2 HS HS 8192 FEB 25 10:32 24576
DRWX------2 HS HS 8192 FEB 25 10:32 24577
All OIDs greater than 16383 are created by end users. In this case, there are three such databases.
Therefore, all system databases (template0,template1, and Postgres) and the databases that are required on the slave can be copied to the underlying directory:
CP-RV 24576 1 13051 13056.. /.. /slave/base/
[Note that typically, slave is on a remote system, so you should use rsync or similar tools.] In this example, all together are on the same node to make your life easier. ]
The important thing is that in most settings, the base directory is by far the largest directory. Once the data is obtained, the backup can be stopped, as follows:
test=# SELECT pg_stop_backup ();
Notice:wal archiving is not enabled; You must ensure it all required WAL segments is copied through other means to complete the backup
Pg_stop_backup
----------------
0/2000238
(1 row)
So far, everything is the same as regular stream replication-the only difference is that not all directories of the underlying directory are actually synchronized to slave.
In the next steps, create a simple recovery.conf file:
slave]$ Cat recovery.conf
Primary_conninfo = ' Host=localhost port=5433 '
Standby_mode = On
The most important thing here is that the port of slave must be written to the configuration file. Slave will no longer be able to see master directly, but will consume all of its xlog through Walbouncer.
Start Walbouncer
Once the configuration is complete, you can start the walbouncer. Walbouncer's syntax is simple:
$ walbouncer–help
Walbouncer tool agent PostgreSQL for streaming replication connections and selectively filtering
Options:
-?,--help Print this message
-C,--config=file Read configuration from the This FILE.
-H,--host=host Connect to master on the this host.
Default localhost
-P,--masterport=port Connect to master on the This PORT.
Default 5432
-P,--port=port Run Proxy on the this port. Default 5433
-V,--verbose Output additional debugging information
All relevant information is in the config file, so be sure to start:
$ walbouncer-v-C Config.ini
[2015-02-25 11:56:57] wbsocket.c info:starting socket on port 5433
Option-V is not mandatory. All it does is give us a little more information that is happening. Once the starting socket information is displayed, it means that everything is going perfectly.
Finally, you can start the slave. Switch to the Slave data directory and start it as follows:
slave]$ pg_ctl-d. -O "--port=5444" start
Server starting
Log:database system was shut under recovery at 2015-02-25 11:59:12 CET
log:entering Standby mode
Log:redo starts at 0/2000060
Log:record with zero length at 0/2000138
Info:wal stream is being filtered
Detail:databases included:a
Log:started streaming WAL from primary to 0/2000000 on timeline 1
Log:consistent recovery state reached at 0/2000238
Log:database system is ready for accept read Only connections
The dots here represent a local directory (and of course, putting the full path here is also a good idea-absolutely). Here's another tip: when synchronizing slave over and over again on the same server for testing purposes, changing the ports over and over again is annoying. The-o option helps to rewrite the configuration file in postgresql.conf so that the system can be started directly using a different port.
[If PostgreSQL is started on a standalone server, it is certainly useful to start the server through a normal initialization program.] ]
As soon as slave starts to work, Walbouncer will start releasing more log information, telling us more about the status of the stream:
[2015-02-25 11:58:37] WBCLIENTCONN.C info:received Conn from 0100007f:54729
[2015-02-25 11:58:37] WBCLIENTCONN.C debug1:sending authentication Packet
[2015-02-25 11:58:37] wbsocket.c DEBUG1:Conn:Sending to client 9 bytes of data
[2015-02-25 11:58:37] wbclientconn.c Info:start connecting to Host=localhost port=5432 user=hs dbname=replication replic Ation=true Application_name=walbouncer
Fatal:no pg_hba.conf entry for replication connection from host ":: 1", User "HS"
Once these messages are displayed, the system is in a running state and the transaction log flows from master to slave.
Now is the time to test:
$ psql-h localhost-p 5444 b
Fatal:database "B" does not exist
Detail:the database subdirectory "base/24577" is missing.
When a connection to a filtered database is established, PostgreSQL will make an error and tell the user that the required file for the service request does not exist. This is the kind of behavior that is expected-because a lack of data requests should be rejected.
When connecting to a database containing a database instance, all things work like a magic:
$ psql-h localhost-p 5444 A
Psql (9.4.1)
Type ' help ' for help.
a=#
The next test checks whether the data is well copied from master to slave. To perform this check, you can have a table on the master scene:
a=# CREATE TABLE A (aid int);
CREATE TABLE
As expected, the table will be well terminated on the slave.
a=# \d
List of relations
Schema | Name | Type | Owner
--------+------+-------+-------
Public | A | Table | Hs
(1 row)
The system is now ready and safe to use.
Using Additional configuration options
What the Walbouncer tool can do for users goes far beyond what is currently listed. Some additional configuration parameters are also available.
The first example shows what you can do if there is more than one slave:
listen_port:5433
Master
Host:localhost
port:5432
Configurations:
-Slave1:
Match
Application_name:slave1
Filter
Include_tablespaces: [Spc_slave1]
Exclude_databases: [Test]
In the configuration block, there is a slave1 part. If a slave connects itself using SLAVE1 as application_name (according to the terms listed in application_name), the SLAVE1 configuration will be selected. If this configuration is selected by the server (which can have many of these slave parts), the filters listed in the next block will be applied.
Basically, each type of filter has two kinds of embodiment: Include_ and Exclude_. In this example, only the Spc_slave1 table space is included. The final setup says that only test is excluded (if the tablespace filter matches them, all other databases are included).
Of course, it is also possible to look at the description:
Exclude_tablespaces: [Spc_slave1]
Include_databases: [Test]
In all cases, only spc_slave1 of all tablespaces are included. Only the system database and database test are replicated. Given these include_ and Exclude_ settings, you have the flexibility to configure what to replicate to that slave.
Keep in mind that synchronous replication also requires application_name. If the Application_name parameter passed by Walbouncer is the same as the application_name listed in Synchronous_standby_names, it can be replicated synchronously.
As you can see, application_name is used here for two purposes: it determines which config blocks to use and tells Master which level of replication is required.
Adjust filter rules
One question that is often asked is whether the filter rules can be adjusted. The object may then be added, or the object may be deleted. In many cases, this is a very common scenario that people often ask.
Changing the configuration of a walbouncer setting does not look easy. The core issue is synchronizing the xlog and ensuring that all related objects are ready. Let's pass the core challenge one by one.
Delete and Filter objects
Basically, deleting objects is fairly straightforward. The first thing to do is to turn off slave and walbouncer. Once this is done, objects that are no longer needed can be physically deleted from the slave file system. The important part here is to find those objects. Again, the method chosen here is to excavate the system tables. The core system tables or views involved are as follows:
Pg_class: The table contains a series of objects (tables, indexes, and so on). It is important to remove the representation of an object from the table.
Pg_namespace: This is the information used to get the schema.
Pg_inherit: The information used in the inheritance is in this table.
There is no general guide on how to find all these objects, because things are highly dependent on the type of filtering applied.
The simplest way to prepare an appropriate SQL query to find the objects (files) that must be deleted is to use the option-E with Psql, which displays all the SQL code behind the backslash command. Front-end SQL code can be handy. Here are some sample outputs:
test=# \q
[Email protected]:~$ psql test-e
Psql (9.4.1)
Type ' help ' for help.
test=# \d
QUERY **********
Select N.nspname as "Schema",
C.relname as "Name",
Case C.relkind when ' r ' and ' table ' when ' V ' then ' view ' when ' m ' then ' materialized view ' if ' I ' then ' index ' when ' S ' Then ' sequence ' when ' s ' Then ' special ' when ' f ' then ' foreign table ' END as ' Type ',
Pg_catalog.pg_get_userbyid (C.relowner) as "Owner"
From Pg_catalog.pg_class C
Left JOIN pg_catalog.pg_namespace n on n.oid = C.relnamespace
WHERE C.relkind in (' R ', ' V ', ' m ', ' S ', ' f ', ')
and N.nspname <> ' Pg_catalog '
and N.nspname <> ' Information_schema '
and N.nspname!~ ' ^pg_toast '
and Pg_catalog.pg_table_is_visible (c.oid)
ORDER by 1, 2;
**************************
List of relations
Schema | Name | Type | Owner
--------+--------------------+-------+-------
...
Once the files are removed from the underlying directory, the Walbouncer and slave instances can be restarted. Keep in mind that walbouncer is a tool used to make regular flow more powerful. Therefore, slave is still read-only and it is impossible to use commands such as delet and drop. You really have to remove the files from the disk.
Add objects to Slaves
Adding objects is the most complex task at the moment. Therefore, it is highly recommended to use a safer and simpler way to solve this problem. The safest and most reliable method is to fully synchronize an instance, which requires a new object.
Simply use the mechanism described earlier in this chapter to avoid all pitfalls.
Summarize
In this chapter, we discuss the work of walbouncer, a tool for filtering transaction logs. In addition to the installation process, an overview of all configuration options and a basic setting is provided.
You learned how to set up a geographic distribution.
The 15th chapter of PostgreSQL replication works with Walbouncer