Web site mirroring and backup with Rsync [reprint]

Source: Internet
Author: User
Tags file copy format definition syslog perl script rsync

Brief introduction
For small and medium-sized businesses or websites that choose Linux as an application platform, they often face problems with remote backup of data or mirror images of the site, although there are commercially available backup and mirror products to choose from, but the prices of these products are often too expensive. Therefore, how to use free software to efficiently realize remote backup and website mirror image becomes a topic worthy of discussion.

The simplest way to make a remote data backup or mirror image over a network is to use wget, but this approach requires all data to be retransmitted over the network every time, regardless of which files are updated and therefore inefficient. Especially when the amount of data that needs to be backed up is large, it often takes several hours to transfer data over the network.

So here's an efficient network remote backup and Mirror tool-rsync, which can meet most requirements that are not particularly stringent.

Rsync is a data-mirroring Backup tool under Unix-like systems, and--remote sync can be seen from the name of the software. It has the following characteristics:

The entire directory tree and file system can be saved in a mirror.
It is easy to keep the original file permissions, time, soft and hard links and so on.
Install without special permissions.
Optimized process with high file transfer efficiency.
You can use RCP, ssh and other means to transfer files, of course, you can also through a direct socket connection.
Support anonymous transmission, to facilitate the site mirror image.
Software download
RYSNC's homepage address is:


Currently the latest version is 2.4.6. You can choose to download from the original website: http://rsync.samba.org/ftp/rsync/. You can also choose to download from this site: rsync 2.4.6.

Compiling the installation
The build-and-install of rsync is simple and requires only the following simple steps:

[Email protected] rsync-2.4.6]#./configure
[[email protected] rsync-2.4.6]# make
[[email protected] rsync-2.4.6]# make install

However, it is important to note that rsync must be installed on both server A and B, where a server is running Rsync in server mode, while B is running rsync on a client-side basis. This runs the rsync daemon on Web Server A and periodically runs the client program on B to back up the content that needs to be backed up on Web server A.

Rsync Server
1, the start of Rsync server

On Web Server A, you need to start the rsync server in a daemon manner, just run:

[Email protected] rsync-2.4.6]#/usr/local/bin/rsync--daemon

Can be started. The Rsync default service port is 873, and the server receives the client's anonymous or authenticated backup request on that port.

There are several different ways to serve when you start up, such as:

A. Join inetd.conf

Edit/etc/services, join Rsync 873/tcp, specify that the service port for Rsync is 873. Add/etc/inetd.conf, add rsync stream TCP nowait root/bin/rsync rsync--daemon

Note: For xinetd, the Setup method is similar.

B. Join Rc.local

Edit/etc/rc.d/rc.local, add at the end:


2, the configuration of rsync

For the rsync server, the most important and complex is its configuration. The rsync server configuration file is/etc/rsyncd.conf, which controls authentication, access, logging, and so on.

The file is made up of one or more module structures. A module definition starts with the module name in square brackets until the next module definition starts or the file ends, and the module contains a parameter definition that is formatted as name = value. Each module actually corresponds to a directory tree that needs to be backed up, for example, in our instance environment, there are three directory trees that need to be backed up:/www/,/home/web_user1/, and/home/web_user2/, then you need to define three modules in the configuration file. Corresponds to three directory trees respectively.

A configuration file is a unit of behavior, meaning that each new line represents a new comment, module definition, or parameter assignment. Lines that begin with # represent comments, and rows that end with "\" indicate that the following line is the continuation of the row. A parameter assignment may be a case-insensitive string, a Boolean value represented by Trure/false, after the equal sign.

Global parameters

All parameters before [Modlue] in a file are global parameters, but you can also define module parameters in the global Parameters section, when the value of this parameter is the default value for all modules.

MOTD file

The "MOTD file" parameter is used to specify a message file where the contents of the file are displayed to the client when the client connects to the server, and the default is no MOTD file.

Log file

"Log File" specifies the log file for rsync without sending the log to the syslog.

PID File

Specifies the PID file for rsync.

Syslog facility

Specifies the message level at which rsync sends log messages to syslog, with common message levels: Uth, Authpriv, cron, Daemon, FTP, Kern, LPR, mail, news, security, Sys-log, user, UUCP , Local0, Local1, Local2, Local3,local4, LOCAL5, Local6 and LOCAL7. The default value is daemon.

Module parameters

After the global parameter, you need to define one or more modules, which can be defined in the module:


Assign a description to the module, which, together with the module name, is displayed to the customer when the client connects to the module list. The default does not describe the definition.


Specifies the directory tree path for the module to be backed up, which must be specified.

Use Chroot

If use chroot is specified as true, rsync first chroot to the directory specified by the path parameter before transferring the file. The reason for this is to implement additional security protections, but the disadvantage is that you need to roots permissions, and you cannot back up the directory files that point to external symbolic connections. By default, the Chroot value is true.

Max connections

Specifies the maximum number of concurrent connections for the module to protect the server, and exceeding the limit of connection requests will be told to retry later. The default value is 0, which means there is no limit.

Lock file

Specifies the lock file that supports the Max connections parameter, which is the default value of/var/run/rsyncd.lock.

Read Only

This option sets whether customers are allowed to upload files. If True then any upload requests will fail, if False and the server directory read and write permissions allow the upload to be allowed. The default value is true.


This option sets whether the module should be listed when the list of modules that the customer requests can be used. If you set this option to False, you can create a hidden module. The default value is true.


This option specifies the UID that the daemon should have when the module transmits the file, with the GID option using the file permissions that can determine what access is available, and the default value is "nobody".


This option specifies the GID that the daemon should have when the module transmits files. The default value is "nobody".


Used to specify more than one list of patterns separated by spaces and add them to the exclude list. This is equivalent to using--exclude in the client command to specify the pattern, but the exlude pattern specified in the configuration file is not passed to the client and is applied only to the server. A module can specify only one exlude option, but you can use "-" and "+" in front of the pattern to specify whether it is exclude or include.

However, it is important to note that this option has a certain security issue and that the customer is likely to bypass the Exlude list, which is best combined with the UID/GID option if you want to ensure that a particular file cannot be accessed.

Exlude from

Specifies a file name that contains the definition of the exclude schema from which the server reads the Exlude list definition.


A list of patterns used to specify multiple rsync separated by spaces and should be exlude. This equates to the use of--include in client commands to specify patterns, with include and exlude to define complex exlude/include rules. A module can specify only one Include option, but you can use "-" and "+" in front of the pattern to specify whether it is exclude or include.

Include from

Specifies a file name that contains the definition of the include pattern from which the server reads the Include list definition.

Auth Users

This option specifies the list of user names separated by spaces or commas, and only those users are allowed to connect to the module. There is no relationship between the user and the system user. If the "Auth users" is set, then the client sends a connection request to the module that will be authenticated by rsync request challenged Challenge/response authentication protocol used here. The user's name and password are stored in plaintext in the file specified by the "Secrets file" option. By default, no password is required to connect to the module (that is, anonymous mode).

Secrets file

This option specifies a file that contains a password pair that defines the user name: This file is only useful if the "auth users" is defined. Each line of the file contains one username:passwd pair. In general, passwords are best not to exceed 8 characters. There is no default secures file name, you need to specify a limit. (Example:/etc/rsyncd.secrets)

Strict modes

This option specifies whether to monitor the permissions of the password file, and if the option value is true then the password file can only be accessed by the user running the Rsync server, and no other user can access the file. The default value is true.

Hosts allow

This option specifies which IP clients are allowed to connect to the module. The customer pattern definition can be in the following form:

o xxx.xxx.xxx.xxx, the client host only has an exact match to that IP to allow access. Example:

o a.b.c.d/n, customers belonging to the network are allowed to connect to the module. Example:

o A.b.c.d/e.f.g.h, customers belonging to the network are allowed to connect to the module. Example:

o a host name in which the client host is allowed access only if the host name is owned, for example: backup.linuxaid.com.cn.

o *.linuxaid.com.cn, all hosts that belong to the domain are allowed.

The default is to allow all hosts to connect.

Hosts Deny

Specifies that machines not allowed to connect to the Rsync server can be defined using the definition of hosts allow. The default is no hosts deny definition.

Ignore errors

Specifies that RSYNCD ignores IP errors on the server when deciding whether to run the transfer, and in general, Rsync skips the--delete operation when an IO error occurs to prevent serious problems due to temporary resource shortages or other IO errors.

Ignore nonreadable

Specifies that the RYSNC server completely ignores files that the user does not have access to. This is useful for situations where some files in a directory that need to be backed up should not be available to the backup person.

Transfer logging

Make the Rsync server use FTP-formatted files to record download and upload operations in its own separate log.

Log format

With this option, users can customize the fields of the log file by using transfer logging. The format is a string that contains the format definition, and the format definition you can use is as follows:

o%h Remote Host name

o%a Remote IP address

o%l file length character number

o%p the process ID of the rsync session

o%o Operation type: "Send" or "recv"

o%f file name

o%P Module Path

o%m Module Name

o%t Current Time

o%u authenticated user name (null when anonymous)

o The number of bytes actually transferred by%b

o%c when a file is sent, the field records the checksum of the file

The default log format is: "%o%h [%a]%m (%u)%f%l", in general, "%t [%p]" is added to the head of each line. A Perl script called Rsyncstats is also published in the source code to count the log files in this format.


This option allows you to override the IP time-out period specified by the customer. This option ensures that the rsync server does not wait forever for a crashed customer. The timeout unit is seconds, and 0 means there is no timeout definition, which is also the default value. An ideal number for an anonymous rsync server is 600.

Refuse options

This option allows you to define a list of command parameters that are not allowed to be used by customers on the module. The full name of the command must be used here, not the abbreviation. However, when a command is rejected, the server reports an error message and then exits. If you want to prevent the use of compression, it should be: "Dont compress = *".

Dont compress

Used to specify files that are not compressed and then transferred, the default value is

*.gz *.tgz *.zip *.z *.rpm *.deb *.iso *.bz2 *.tbz

Rsync Customer Command
After the rsync server configuration is complete, the next step is to issue the rsync command on the client side to back up the server's files to the client. Rsync is a very powerful tool, and its commands have a lot of feature options, and we have an analysis of its options below.

First, the command format for rsync can be:

rsync [OPTION] ... src [src] ... [[email protected]] Host:dest

rsync [OPTION] ... [[email protected]] HOST:SRC DEST

rsync [OPTION] ... src [src] ... DEST

rsync [OPTION] ... [[email protected]] HOST::SRC [DEST]

rsync [OPTION] ... src [src] ... [[email protected]] HOST::D EST

rsync [OPTION] ... rsync://[[email protected]]host[:P ort]/src [DEST]
Rsync has six different modes of operation:

Copy the local file and start this mode of operation when both the SRC and des path information do not contain a single colon ":" delimiter.

Use a remote shell program (such as rsh, SSH) to copy the contents of the local machine to the remote machine. This mode is started when the DST path address contains a single colon ":" delimiter.

Use a remote shell program (such as rsh, SSH) to copy the contents of the remote machine to the local machine. This mode is started when the SRC address path contains a single colon ":" delimiter.

Copy files from the remote rsync server to the local machine. This mode is started when the SRC path information contains the "::" delimiter.

Copy files from the local machine to the remote rsync server. This mode is started when the DST path information contains the "::" delimiter.

The list of files for the remote machine is listed. This is similar to the rsync transfer, but only if the local machine information is omitted from the command.
1. Usage

When using rsync to transfer files, you need to specify a source and a purpose, one of which may be the resource information for the remote machine. For example:

Rsync *.c foo:src/

Indicates that all files ending in. c in the current directory will be transferred to the SRC directory of machine foo. If any files already exist in the remote system, the remote update protocol is invoked to implement only those files that have been updated.

Rsync-avz foo:src/bar/data/tmp

This command recursively transfers all contents of the Src/bar directory on the machine foo to the local/data/tmp/bar directory. Files are transferred in archive mode to ensure that symbolic links, attributes, permissions, and other information are saved in transit. In addition, compression techniques can be used to speed up data transfer:

Rsync-avz foo:src/bar//data/tmp

The path information at the end of "/" means that the directory is copied, not the "/" at the end of the directory. The difference between the two cases will be shown when the--delete option is used with mates.

Rsync can also be used in local mode if the SRC and DST paths do not have any ":" symbols indicating that the command is running in local mode, equivalent to the CP command.

Rsync somehost.mydomain.com::

This mode will list somehost.mydomain.com. All module information that can be accessed.

Option description

-V,--verbose verbose mode output
-Q,--quiet thin output mode
-C,--checksum turn on the check switch to force verification of file transfers
-A,--archive archive mode, which means to transfer files recursively and keep all file attributes equal to-rlptgod
-R,--recursive subdirectories in recursive mode
-R,--relative using relative path information

Rsync foo/bar/foo.c remote:/tmp/

The foo.c file is created in the/tmp directory, and if you use the-R parameter:

Rsync-r foo/bar/foo.c remote:/tmp/

The file/tmp/foo/bar/foo.c is created, that is, the full path information is maintained.

-B,--backup creates a backup, that is, the old file is renamed to ~filename when the same file name exists for the purpose. You can use the--suffix option to specify a different backup file prefix.
--backup-dir store backup files (such as ~filename) in the directory.
-suffix=suffix Defining backup File prefixes
-U,--update only updates, which is to skip all the files that already exist in DST, and the file time is later than the file to be backed up. (Does not overwrite the updated file)
-L,--links reserved Soft link
-L,--copy-links to handle soft links like regular files
--copy-unsafe-links only copies links to links outside the SRC Path directory tree
--safe-links ignoring links to the SRC Path directory tree
-H,--hard-links reserved Hard link
-P,--perms maintain file permissions
-O,--owner keep file owner information
-G,--group keep file group information
-D,--devices keep device file information
-T,--times keep file time information
-S,--sparse special processing of sparse files to save DST space
-N,--dry-run reality which files will be transmitted
-W,--whole-file copy files without incremental detection
-X,--one-file-system do not cross file system boundaries
-B, the block size used by the--block-size=size test algorithm, is 700 bytes by default
-E,--rsh=command specifies the shell program to replace Rsh
--rsync-path=path Specify the path information for the rsync command on the remote server
-C,--cvs-exclude uses the same method as CVs to automatically ignore files to exclude files that you do not want to transfer
--existing only updates those files that already exist in DST without backing up those newly created files
--delete Delete those files that are not in the DST SRC
--delete-excluded also deletes those files that are excluded by the option specified by the Receive side
--delete-after transfer ends after removal
--ignore-errors Timely IO errors are also deleted
--max-delete=num deleting NUM files up to a maximum
--partial retains files that are not fully transmitted for any reason, to expedite subsequent transmissions
--force forcibly delete a directory, even if it is not empty
--numeric-ids does not match the user and group ID of a number to a user name and group name
--timeout=time IP time-out, in seconds
-I,--ignore-times do not skip files that have the same time and length
--size-only when deciding whether to back up a file, just look at the file size regardless of file time
--modify-window=num determines whether the file is time-stamped with the time Stamp window, which defaults to 0
-t--temp-dir=dir creating temporary files in Dir
--compare-dest=dir also compares the files in DIR to determine if a backup is required
-p equivalent to--partial
--progress Show Backup process
-Z,--compress compress the backed-up files during transmission
--exclude=pattern specify to exclude file modes that do not need to be transferred
--include=pattern specifies file modes that need to be transferred without exclusion
--exclude-from=file exclude files in the specified schema in file
--include-from=file does not exclude files that specify pattern matching
--version Print version Information
--address binding to a specific address
--config=file specify a different configuration file, do not use the default rsyncd.conf file
--port=port Specify a different rsync service port
--blocking-io using blocking IO for remote shells
-stats gives the transfer status of some files
--progress in the transmission of the real-time transmission process
--log-format=format specifying the log file format
--password-file=file get the password from file
--bwlimit=kbps limit I/O bandwidth, Kbytes per second
-H,--help display Help information
Example analysis
This assumes that there are two servers: A and B. Where A is the primary Web server, has the domain name www.linuxaid.com.cn (, b server is the backup machine, its domain name is backup.linuxaid.com.cn ( Where A's web content resides in the following places:/www/and/home/web_user1/and/home/web_user2/. We need to make a backup of the contents of these directories on the backup machine B.

Server Configuration Instance

Then create the RSYNCD configuration file/etc/rsyncd.conf on www.linuxaid.com.cn, as follows:

UID = nobody
GID = Nobody
Use chroot = no
Max connections = 4
PID file =/var/run/rsyncd.pid
Lock file =/var/run/rsync.lock
Log file =/var/log/rsyncd.log

Path =/www/
Ignore errors
Read Only = True
List = False
Hosts allow =
Hosts Deny =
Auth users = Backup
Secrets file =/etc/backserver.pas

Path =/home/web_user1/
Ignore errors
Read Only = True
List = False
Hosts allow =
Hosts Deny =
UID = Web_user1
GID = Web_user1
Auth users = Backup
Secrets file =/etc/backserver.pas

Path =/home/web_user2/
Ignore errors
Read Only = True
List = False
Hosts allow =
Hosts Deny =
UID = Web_user2
GID = Web_user2
Auth users = Backup
Secrets file =/etc/backserver.pas

There are four three modules defined here, each corresponding to three directory trees that need to be backed up. This allows only to back up native data and requires authentication. All three modules are backed up by backup, and the user information is stored in the file/etc/backserver.pas, which reads as follows:


And the file can only be read and written by the root user, or the RSYNCD will start with an error. After these files are configured, you need to start the RSYNCD server on the a server:


Example of a customer command

/USR/LOCAL/BIN/RSYNC-VZRTOPG--delete--exclude "logs/"--exclude "conf/ssl.*/"--progress [email protected]::www/ backup/www/--password-file=/etc/rsync.pass

Above this command line in the-VZRTOPG V is verbose,z is compressed, R is Recursive,topg is to keep the original properties of the file, such as owner, time parameters. --progress is to show the detailed progress,--delete is that if the server side delete the file, then the client will also delete the file, maintain true consistency. --Exclude "logs/" means that the files in the/www/logs directory are not backed up. --exclude "conf/ssl.*/" means that the files in the/www/conf/ssl.*/directory are not backed up.

[e-mail protected]::www indicates that the command is a backup of the WWW module in Server, and backup indicates that the module is backed up using Backup.

--password-file=/etc/rsync.pass to specify the password file so that it can be used in the script without having to enter the authentication password interactively, it is important to note that the password file permission attribute is set to be only root readable.

The contents of the backup are stored in the/backup/www/directory of the backup machine.

[Email protected]/]#/USR/LOCAL/BIN/RSYNC-VZRTOPG--delete--exclude "logs/"--exclude "conf/ssl.*/"--progress [email protected]::www/backup/www/--password-file=/etc/rsync.pass
Receiving file list ... done
785 (100%)
4086 (100%)
10680 (100%)
0 (100%)
3956 (100%)
Wrote 2900 bytes read 145499 bytes 576.34 bytes/sec
Total size was 2374927 speedup is 45.34

The commands for operations on the other two modules are:

/USR/LOCAL/BIN/RSYNC-VZRTOPG--delete--progress [email protected]::web_user1/backup/web_user1/--password-file=/ Etc/rsync.pass

/USR/LOCAL/BIN/RSYNC-VZRTOPG--delete--progress [email protected]::web_user2/backup/web_user2/--password-file=/ Etc/rsync.pass

You can use the CRONTAB-E command to automate backups, such as CRONTAB-E, by using the customer command:


Some sample scripts
These scripts are examples of rsync websites:

1. Make incremental backup of data to the central server every seven days


# This script does personal backups to a rsync backup server. You'll end up
# with a 7 day rotating incremental backup. The incrementals would go
# into subdirectories named through the day of the week, and the current
# Full backup goes to a directory called "current"
# [Email protected]

# Directory to Backup

# excludes File-this contains a wildcard pattern per line of the files to exclude
excludes= $HOME/cron/excludes

# The name of the backup machine

# Your password on the backup server
Export Rsync_password=xxxxxx


Backupdir= ' Date +%a '
opts= "--force--ignore-errors--delete-excluded--exclude-from= $EXCLUDES
--delete--backup--backup-dir=/$BACKUPDIR-a "

Export path= $PATH:/bin:/usr/bin:/usr/local/bin

# The following line clears the last weeks incremental directory
[-D $HOME/emptydir] | | mkdir $HOME/emptydir
rsync--delete-a $HOME/emptydir/$BSERVER:: $USER/$BACKUPDIR/
RmDir $HOME/emptydir

# now the actual transfer
Rsync $OPTS $BDIR $BSERVER:: $USER/current

2. Back up to a free hard drive


Export Path=/usr/local/bin:/usr/bin:/bin

list= "Rootfs usr data data2"

For d in $LIST; Do
Rsync-ax--exclude fstab--delete/$d//backup/$d/

day= ' Date ' +%a "'


3. Mirror the CVS Tree of vger.rutgers.edu



run= ' LPs x | grep rsync | Grep-v grep | Wc-l '
If ["$RUN"-GT 0]; Then
Echo already running
Exit 1

Rsync-az Vger.rutgers.edu::cvs/cvsroot/changelog $HOME/changelog

sum1= ' Sum $HOME/changelog '
Sum2= ' Sum/var/www/cvs/vger/cvsroot/changelog '

If ["$sum 1" = "$sum 2"]; Then
echo Do
Exit 0

Rsync-az--delete--force vger.rutgers.edu::cvs//var/www/cvs/vger/
Exit 0

Q: How to do rsync via SSH without entering the password?
A: There are A few steps you can follow

1. Set up SSH keys on Server A via Ssh-keygen, do not specify the password, you will see the identity and identity.pub files under ~/.ssh
2. Create a subdirectory in the home directory on Server B. SSH
3. Copy the identity.pub of a to Server B
4. Add Identity.pub to ~[user B]/.ssh/authorized_keys
5. As a user on Server A, you can use the following command to SSH to Server B with User B
e.g. Ssh-l UserB ServerB
This allows user A on server A to SSH to Server B without a password as User B.

Q: How can I use rsync through a firewall without compromising security?
A: The answers are as follows:

This usually has two cases, one is the server is inside the firewall, and the other is the server is outside the fire wall. In either case, SSH is usually used, and it is best to create a new backup user, and configure SSHD to allow only this user access via RSA authentication. If the server is inside a firewall, it is better to qualify the client's IP address and deny all other connections. If the client is inside the firewall, you can simply allow the firewall to open the SSH outgoing connection on TCP port 22 OK.

Q: Can I also back up files that have been changed or deleted?
A: Of course you can:

You can use such as: Rsync-other-options-backupdir =./backup-2000-2-13 ... Such a command to implement.
So if the source file:/path/to/some/file.c changed, then the old file will be moved to./BACKUP-2000-2-13/PATH/TO/SOME/FILE.C,
This directory needs to be built by hand.

Q: What ports do I need to open on the firewall to accommodate rsync?
A: Subject to availability

Rsync can transfer files directly over a 873 port TCP connection, or through 22 port SSH, but you can also change its port with the following command:

Rsync--port 8730 otherhost::
Rsync-e ' Ssh-p 2002 ' Otherhost:

Q: How can I only copy the directory structure via rsync, ignoring the files?
A:rsync-av--include ' */'--exclude ' * ' Source-dir dest-dir

Q: Why do I always appear "Read-only file system" error?
A: see if you forgot to set "Read Only = no"

Q: Why do I appear ' @ERROR: Invalid gid ' error?
A:rsync Use the default is to run with Uid=nobody;gid=nobody, if your system does not exist nobody group, there will be such an error, you can try GID = Nogroup or other

Q: What happened to the failed bind port 873?
A: If you are not running this daemon with root privileges, this error will occur because port 1024 below is a privileged port. You can use the--port parameter to change it.

Q: Why does my authentication fail?
A: from your command line it looks like:

You are using:
>; bash$ rsync-a Test
>; Password:
>; @ERROR: Auth failed on module test
>; I dont understand this. Can somebody explain as-to-acomplish this.
>; All suggestions is welcome.

Should not be in your user name landing caused by the problem, try Rsync-a [email protected]::test test

Web site mirroring and backup with Rsync [reprint]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.