3.3 Do the basic backup
In the previous section, you have seen that enabling archiving requires only a few lines of command and provides great flexibility. In this section, we will see how to create a so-called base backup, which can be used later with Xlog. A basic backup is a copy of the original data.
[Keep in mind that Xlog itself is of no value.] It is only useful when combined with the initial backup. ]
In PostgreSQL, there are two main choices to create an initial basic backup:
• Use Pg_basebackup
• Traditional Copy/rsync-based approach
The following two sections describe in detail how to create a basic backup:
Using Pg_basebackup
The first and most common creation of an existing server backup is to run a command called Pg_basebackup, which is introduced in PostgreSQL 9.1.0. Basically, Pg_basebackup can get a basic backup of a database directly from a database connection. However, when executed on slave, Pg_basebackup will connect to a database server of your choice and copy all data files from the data directory to your machine. There is no need to login to the server, it needs only one line of code and run it; Pg_basebackup will do all the rest for you.
In this example, we will assume that we are going to make a basic backup of a host called postgresql-support.de. You must perform the following steps:
• Modify pg_hba.conf to allow replication
• Signal Master to consider changes in pg_hba.conf
• Call Pg_basebackup
Modify Pg_hba.conf
In order to allow a remote server to log on to a PostgreSQL server and allow the XLOG to be streamed, you must explicitly allow replication.
In PostgreSQL, there is a file that becomes pg_hba.conf, which tells the server which server is allowed to use which type of credentials to connect. The entire range of IP addresses can be allowed or simply discarded via pg_hba.conf.
To enable replication, we must add a row for the range of IP addresses we want to allow.
The following list contains an example of a valid configuration:
# TYPE DATABASE USER ADDRESS METHOD
Host replication All 192.168.0.34/32 MD5
In this example, we allow replication from the 192.168.0.34 connection. The range of IP addresses is determined using 32-bit (this represents our example only). We decided to use MD5 as the authentication method. This means that the pg_basebackup must provide a password to the server. If you use trust as a certification method in a low-security environment, it may also be a choice.
[What happens if you really have a database called replication in your system?] Basically, setting up a database for replication only configures your streaming behavior, and if you want to introduce a rule for a database that processes a database named replication, you must reference the database name as follows: "Replication". However, we strongly recommend that you do not do this in a way that avoids confusion. ]
Send a signal to the master server
Once the pg_hba.conf has been changed, we can tell PostgreSQL to reload the configuration. It is not necessary to completely restart the database. We have three options to reload pg_hba.conf:
• By running SQL command: SELECT pg_reload_conf ();
• By sending a signal to master: Kill–hup 4711 (4711 is the process ID of master)
• By calling Pg_ctl:pg_ctl–d $PGDATA Reload ($PGDATA is the home directory of your DB instance)
Once we tell the server to accept stream connections as a data source, we can continue to run pg_basebackup.
pg_basebackup-Basic Functions
For PostgreSQL, Pg_basebackup is a very easy-to-use command-line tool. It must be called by the target system and provide you with a ready-to-use base backup that is ready to consume the transaction log for instant recovery.
The syntax for Pg_basebackup is as follows:
imac:dbhs$ Pg_basebackup--help
Pg_basebackup takes a base backup of a running PostgreSQL server.
Usage:
Pg_basebackup [OPTION] ...
Options controlling the output:
-D,--pgdata=directory receive base backup into
Directory
-F,--format=p|t output format (plain (default),
Tar
-X,--xlog include required WAL files in
Backup (Fetch mode)
-X,--xlog-method=fetch|stream
Include Required WAL files with
Specified method
-Z,--gzip compress tar output
-Z,--compress=0-9 compress tar output with given
Compression level
General Options:
-C,--checkpoint=fast|spread
Set fast or spread checkpointing
-L,--label=label set backup label
-P,--progress show progress information
-V,--verbose output verbose messages
-V,--version output version information, then exit
-?,--Help show this help and then exit
Connection options:
-H,--host=hostname database server host or
Socket Directory
-p,--port=port database server port number
-S,--status-interval=interval
Time between status packets sent to server (in seconds)
-U,--username=name connect as specified database
User
-W,--no-password never prompt for password
-W,--password force password prompt (should
happen automatically)
A basic pg_basebackup is as follows:
imac:dbhs$ pg_basebackup-d/target_directory \
-H sample.postgresql-support.de
In this example, we will get the base backup from Sample.postgresql-support.de and put it in our local directory/target_directory. It requires this line to replicate the entire DB instance to the target system. When we create a base backup is, as shown in this section, Pg_basebackup will connect to the server and wait for a checkpoint to occur before the actual process starts. This is necessary because the replay process will start at that point in the Xlog. The problem is that it takes a while for a checkpoint to occur, and Pg_basebackup does not force a checkpoint directly on the source server to ensure that normal operations are undisturbed.
[If you do not want to wait for a checkpoint, consider using-checkpoint=fast. It forces a checkpoint to be executed immediately, and Pg_basebackup begins copying immediately. ]
By default, a basic base backup will be created. It will include all files that can be found in the source server directory. If the underlying backup is to be stored on tape, we recommend using –format=t. He will automatically create a tar archive (perhaps on tape). If you want to transfer data to tape, you can easily omit intermediate steps. When using tar, it is beneficial to use it with-gzip to reduce the size of the underlying backup on disk.
[There is also a way to do the basic backup is to look at the progress bar, but we do not recommend using this option (--progress) because it requires pg_basebackup to first determine the size of the source instance, which can be expensive. ]
pg_basebackup-Self-sufficiency backup
In general, a basic backup without xlog is useless. This is because the basic backup is taken from the fully functioning master. Although the underlying backup was taken, the files stored in the DB instance may have been severely modified by the gene. The purpose of Xlog is to solve the potential problems of those dependent data files. But if we are going to create a base backup, it can have no (explicitly archived) XLOG? In this case, we can use the-xlog-method=stream option. If this option is selected, Pg_basebackup will not just copy the data, it will also stream the Xlog created during the base backup we created for our destination server. This will provide us with enough xlog to allow us to start the basic backup using that way directly. It is self-sufficient and does not require additional xlog files. This is not for immediate recovery, but it can be useful in case of trouble. Having a basic backup that can be started right away is usually a good thing, and its cost is pretty low.
[Note that--xlog-method=stream will require a database connection to the source server, not just one. You must keep this in mind when tuning max_wal_senders on the source server. ]
If you plan to use instant recovery and do not need to start a backup, you can safely skip Xlog and save some space (default mode).
Use traditional methods to create a basic backup
Now Pg_basebackup is the most common way to get an initial copy of a database server. This is not always the case. Traditionally, a different approach has been used to work as follows:
• Call SELECT Pg_start_backup (' some label ');
• Copy all data files to a remote server via rsync or any other method
• Run SELECT pg_stop_backup ();
The main advantage of this old approach is that you do not need to open a database connection and you do not need to configure the Xlog streaming infrastructure on the source server.
Another major advantage of you can use some such as zfs-snapshots or similar methods, which can reduce the number of I/OS created for the initial backup.
[Once you've started the Pg_start_backup, don't worry.] This is not necessary, especially if you do not want to leave the backup mode. If you are in Backup mode, nothing will happen. PostgreSQL will archive the transaction log as usual and the user will not have any disadvantage. Of course, there is no change in how the underlying backup works internally when it is running. There is nothing to fill, no disk I/O latency, or any such thing. ]
Table Space Issues
If you happen to use more than one tablespace, Pg_basebackup will handle the problem well if the file system layout on the target server is the same as the file system layout on master. However, if your target system does not use the same file system layout, there is more to do. Using traditional methods to do base backups can be beneficial in this case.
If you are using –format=t (for tar), you will be given a tar file for each tablespace.
Pay close attention to network bandwidth
Let's assume a scenario that contains two servers. Each server may have only one disk (no SSD). Two of our servers may be interconnected via a gigabit link. What happens to your app if the second server starts running Pg_basebackup? The second server will connect to start streaming data at full speed and damage your hard drive by using the full bandwidth of your network. Applications running on master may immediately face disk waits to provide a higher response time. Therefore, it is highly recommended to control bandwidth usage via rsync to ensure that your business applications have sufficient backup capabilities (usually disk, CPU is not a problem).
[If you want to restrict rsync to 20mb/sec, you can simply use rsync–bwlimit=20000. This will take a longer time to create the underlying backup, but it will ensure that your client app does not face a problem. In general, we recommend a private network interconnection between master and slave to ensure that the underlying backup does not affect normal operations. ]
Limiting bandwidth cannot use Pg_basebackup's onboard functionality. Of course, you can use any other tool to copy data and achieve a similar effect.
[If you are using-gzip using gzip compression, it can be used as a deceleration tool to work. However, this is primarily a possible workaround. ]
The third chapter of PostgreSQL Replication Understanding Instant Recovery (3)