Rsync Data Synchronization in Centos

Source: Internet
Author: User
Tags perl script

Introduction
For small and medium-sized enterprises or websites that choose linux as the application platform, they often face the problem of how to implement remote data backup or website image. Although there are commercial backup and image products available, however, these products are often too expensive. Therefore, how to use free software to efficiently implement remote backup and website images becomes a topic worth discussing.

The simplest way to back up remote data or website images over the network is to use wget. However, in this way, all data needs to be transmitted over the network each time, without considering which files are updated, the efficiency is very low. Especially when the data volume to be backed up is large, it usually takes several hours to transmit data over the network.

Therefore, this article introduces an efficient network remote backup and image mirroring tool-rsync, which can meet the requirements of most backup requests that are not particularly strict.

Rsync is a data image backup tool in unix-like systems. It can be seen from the software name-remote sync. Its features are as follows:

Images can be used to save the entire directory tree and file system.
It is easy to maintain the permissions, time, and soft links of the original file.
Installation without special permissions.
Optimized process, high file transmission efficiency.
You can use rcp, ssh, and other methods to transmit files. Of course, you can also use a direct socket connection.
Supports anonymous transmission for website images.
Software Download
The home address of rysnc is:

Http://rsync.samba.org/

The latest version is 2.4.6. Download from the original Website: http://rsync.samba.org/ftp/rsync /. You can also download rsync 2.4.6 from this site.

Compile and install
Rsync compilation and installation are very simple. You only need to perform the following steps:

[Root @ www rsync-2.4.6] #./configure
The [root @ www rsync-2.4.6] # make
[Root @ www rsync-2.4.6] # make install

However, you must install rsync on server A and server B, where server A runs rsync in server mode and server B runs rsync in client mode. In this way, the rsync daemon is run on web server A, and the client program is regularly run on B to back up the content to be backed up on web server.

Rsync server
1. rsync server startup

On web server A, you need to start the rsync server as A daemon. You only need to run:

[Root @ www rsync-2.4.6] #/usr/local/bin/rsync-daemon

To start. The default service port of rsync is 873. The server receives anonymous or authenticated backup requests from the customer on this port.

There are several different methods to get the service up at startup, such:

A. Add inetd. conf

Edit/etc/services, add rsync 873/tcp, and set the service port of rsync to 873. Add/etc/inetd. conf and rsync stream tcp nowait root/bin/rsync-daemon

Note: For xinetd, the setting method is similar.

B. Add rc. local.

Edit/etc/rc. d/rc. local and add:

/Usr/local/bin/rsync-daemon

2. rsync Configuration

The most important and complex rsync server is its configuration. The configuration file of the rsync server is/etc/rsyncd. conf, which controls authentication, access, and logging.

This file is composed of one or more modules. A module definition starts with the module name in square brackets until the definition of the next module starts or the file ends. The module contains the Parameter definition in the format of name = value. Each module corresponds to a directory tree to be backed up. For example, in our instance environment, there are three directory trees to be backed up: /www/,/home/web_user1/, And/home/web_user2/, You need to define three modules in the configuration file to correspond to three directory trees respectively.

The configuration file is the unit of action, that is, each new line represents a new comment, module definition, or parameter value assignment. The row starting with # indicates the comment, and the row ending with "" indicates that the following row is the continuation of the row. After the parameter value is a medium number, it may be a case-insensitive string and a Boolean Value indicated by trure/false.

Global Parameters

In the file, all parameters before [modlue] are global parameters. Of course, you can also define the module parameters in the global parameters section. At this time, the value of this parameter is the default value of all modules.

Motd file

The "motd file" parameter is used to specify a message file. When the client connects to the server, the file content is displayed to the client. By default, there is no motd file.

Log file

"Log file" specifies the log file of rsync, instead of sending the log to syslog.

Pid file

Specify the pid file of rsync.

Syslog facility

Specifies the message level when rsync sends a log message to syslog. Common Message levels are: uth, authpriv, cron, daemon, ftp, kern, lpr, mail, news, security, sys-log, user, uucp, local0, local1, local2, local3, local4, local5, local6, and local7. The default value is daemon.

Module Parameters

One or more modules need to be defined after global parameters. The following parameters can be defined in the module:

Comment

Specify a description for the module. This description, together with the module name, is displayed to the customer when the customer connects to the module list. No description is defined by default.

Path

Specifies the directory tree path for backup of this module. This parameter must be specified.

Use chroot

If "use chroot" is set to true, rsync first transfers the chroot file to the directory specified by the path parameter before transferring the file. The reason for doing so is to implement additional security protection, but the disadvantage is that you need to use the root privileges and cannot back up the directory files pointed to by the external symbolic connection. The default chroot value is true.

Max connections

Specify the maximum number of concurrent connections for this module to protect the server. connection requests that exceed the limit will be notified and then try again. The default value is 0, that is, there is no limit.

Lock file

Specifies the lock file that supports the max connections parameter. The default value is/var/run/rsyncd. lock.

Read only

This option allows users to upload files. If this parameter is set to true, all upload requests will fail. If this parameter is set to false and the server directory read/write permission permits upload. The default value is true.

List

This option sets whether the module should be listed when the customer requests a list of available modules. If this option is set to false, you can create a hidden module. The default value is true.

Uid

This option specifies the uid that the daemon should have when the module transfers files. With the gid option, you can determine which file permissions can be accessed. The default value is "nobody ".

Gid

This option specifies the gid that the daemon should have when the module transfers files. The default value is "nobody ".

Exlude

It is used to specify multiple mode lists separated by spaces and add them to the exclude list. This is equivalent to using-exclude to specify the mode in the Client Command. However, the exlude mode specified in the configuration file is not passed to the client, but only applied to the server. A module can only specify one exlude option, but you can use "-" and "+" before the mode to specify whether exclude or include.

However, you must note that this option has certain security issues. The customer may bypass the exlude list. If you want to ensure that a specific file cannot be accessed, then it is best to use the uid/gid option together.

Exlude from

Specifies a file name that contains the definition of the exclude mode. The server reads the definition of the exlude list from the file.

Include

It is used to specify multiple rsyncs separated by spaces and the list of modes that should be exlude. This is equivalent to using-include in client commands to specify the mode. You can use include and exlude to define complex exlude/include rules. A module can only specify one include option, but you can use "-" and "+" before the mode to specify whether exclude or include.

Include from

Specifies a file name that contains the definition of the include mode. The server reads the definition of the include list from this file.

Auth users

This option specifies a list of usernames separated by spaces or commas. Only these users can connect to this module. The user here has nothing to do with the System user. If "auth users" is set, the connection request sent by the client to this module will be sent by rsync to challenged for authentication. The challenge/response authentication protocol is used here. The user's name and password are stored in plaintext in the file specified by the "secrets file" option. By default, the module can be connected without a password (that is, the anonymous mode ).

Secrets file

This option specifies a file that contains a defined user name: Password pair. This file works only when "auth users" is defined. Each row of the file contains a username: passwd pair. Generally, the password should not exceed 8 characters. The default secures file name does not exist. You must specify a limit. (For example,/etc/rsyncd. secrets)

Strict modes

This option specifies whether to monitor the permissions of the password file. If this option is set to true, the password file can only be accessed by users running the rsync server identity, and other users cannot access the file. The default value is true.

Hosts allow

This option specifies which IP addresses are allowed to connect to the module. The customer model can be defined in the following format:

O xxx. xxx. The client host is only allowed to access the host that exactly matches the IP address. Example: 192.167.0.1

O a. B. c. d/n. All customers in this network can connect to this module. Example: 192.168.0.0/24

O a. B. c. d/e. f. g. h. Customers of this network can connect to this module. Example: 192.168.0.0/255.255.255.0

O A host name, which can be accessed only when the client host has this host name, for example, backup.linuxaid.com.cn.

O * .linuxaid.com.cn, all hosts in this domain are allowed.

By default, all hosts are allowed to connect.

Hosts deny

Specify a machine that is not allowed to connect to the rsync server. You can use the hosts allow method to define it. Hosts deny is not defined by default.

Ignore errors

The specified rsyncd ignores the IP address error on the server when determining whether to run the delete operation during transmission. Generally, rsync will skip the-delete operation when an IO error occurs, to prevent serious problems caused by temporary lack of resources or other IO errors.

Ignore nonreadable

Specify that the rysnc server ignores all files that the user does not have access. This makes sense when some files in the directory to be backed up should not be backed up by the backup owner.

Transfer logging

The rsync server uses ftp files to record the download and upload operations in its own separate log.

Log format

With this option, you can use transfer logging to customize the log file fields. The format is a string containing the format specifiers. The format specifiers can be used as follows:

O % h remote host name

O % a remote IP Address

O % l file length characters

O % p process id of the rsync session

O % o operation type: "send" or "recv"

O % f file name

O % P module path

O % m Module name

O % t current time

Username for o % u authentication (null when anonymous)

O % B actual transmitted bytes

O % c when a file is sent, this field records the file's verification code

The default log format is "% o % h [% a] % m (% u) % f % l". Generally, "% t [% p]" is added to the header of each line. In the source code, a perl script program named rsyncstats is released to collect statistics on log files in this format.

Timeout

This option overwrites the specified IP timeout value. This option ensures that the rsync server will not always wait for a crashed customer. Timeout is measured in seconds. 0 indicates that no timeout is defined. This is also the default value. An ideal number for anonymous rsync servers is 600.

Refuse options

You can use this option to define a list of command parameters that cannot be used by customers for this module. The full name of the command must be used. However, when a command is rejected, the server reports an error message and then exits. To prevent compression, it should be: "dont compress = *".

Dont compress

Used to specify files that are not compressed and then transmitted. The default value is

*. Gz *. tgz *. zip *. z *. rpm *. deb *. iso *. bz2 *. tbz

Rsync client commands
After the rsync server configuration is complete, the next step is to issue the rsync command on the client to back up the files on the server to the client. Rsync is a very powerful tool, and its commands have many special options. The following describes the options one by one.

First, the Command Format of rsync can be:

Rsync [OPTION]… SRC [SRC]… [USER @] HOST: DEST

Rsync [OPTION]… [USER @] HOST: SRC DEST

Rsync [OPTION]… SRC [SRC]… DEST

Rsync [OPTION]… [USER @] HOST: SRC [DEST]

Rsync [OPTION]… SRC [SRC]… [USER @] HOST: DEST

Rsync [OPTION]… Rsync: // [USER @] HOST [: PORT]/SRC [DEST]
Rsync has six different working modes:

Copy a local file. This mode is enabled when the SRC and DES paths do not contain a single colon (:) separator.

A remote shell program (such as rsh and ssh) is used to copy the content of the local machine to the remote machine. This mode is enabled when the DST path address contains a single colon (:) separator.

A remote shell program (such as rsh and ssh) is used to copy the contents of the remote machine to the local machine. This mode is enabled when the SRC address path contains a single colon (:) separator.

Copy files from the remote rsync server to the local machine. This mode is enabled when the SRC path information contains the ":" separator.

Copy files from the local machine to the remote rsync server. This mode is enabled when the DST path information contains the ":" separator.

List of remote machine files. This is similar to rsync transmission, but you only need to omit the local machine information in the command.
1. Usage

When using rsync to transfer files, you need to specify a source and a destination, one of which may be the resource information of the remote machine. For example:

Rsync *. c foo: src/

Transfers all files ending with. c in the current directory to the src directory of machine foo. If any file already exists in the remote system, the remote update protocol is called to transmit only the updated files.

Rsync-avz foo: src/bar/data/tmp

This command recursively transmits all contents in the src/bar directory on the machine foo to the local/data/tmp/bar directory. Files are transmitted in archive mode to ensure that information such as symbolic links, attributes, permissions, and owner is saved during transmission. In addition, compression technology can be used to accelerate data transmission:

Rsync-avz foo: src/bar // data/tmp

When the path information ends with a slash (/), it indicates copying the directory, instead of ending with a slash. When the-delete option is used together, the difference between the two cases will be shown.

You can also use rsync in local mode. If there is no ":" symbol in the SRC and DST paths, this command runs in local mode, which is equivalent to the cp command.

Rsync somehost.mydomain.com ::

In this mode, all modules that can be accessed by somehost.mydomain.com. are listed.

Option description

-V,-verbose detailed mode output
-Q,-quiet simplified output mode
-C,-checksum: enable the verification switch to force file transfer verification
-A,-archive mode, indicating that the file is transmitted recursively and all file attributes are kept, equal to-rlptgoD.
-R,-recursive processes subdirectories in recursive Mode
-R,-relative uses relative path information

Rsync foo/bar/foo. c remote:/tmp/

Create the foo. c file in the/tmp directory, and use the-R parameter:

Rsync-R foo/bar/foo. c remote:/tmp/

The file/tmp/foo/bar/foo. c will be created, that is, the full path information will be maintained.

-B,-backup creates a backup, that is, if the object already has the same file name, rename the old file ~ Filename. You can use the-suffix option to specify different backup file prefixes.
-Backup-dir: backs up files (for example ~ Filename) is stored in the directory.
-Suffix = SUFFIX defines the backup file prefix.
-U,-update only performs updates, that is, skipping all files that already exist in DST and whose file time is later than the time to be backed up. (Do not overwrite the updated file)
-L,-links retains soft links
-L,-copy-links: process soft links like regular files
-Copy-unsafe-links: only copies links other than the SRC path directory tree.
-Safe-links ignores links other than the SRC path directory tree.
-H,-hard-links keep hard links
-P,-perms to keep File Permissions
-O and-owner keep file owner information
-G and-group keep file group information
-D,-devices: Keep Device File Information
-T,-times preserve the file time information
-S,-sparse performs special processing on sparse files to save DST Space
-N,-dry-run which files will be transmitted
-W,-whole-file: Copy files without incremental Detection
-X,-one-file-system do not span the boundaries of the file system
-B,-block-size = SIZE indicates the block size used by the algorithm. The default value is 700 bytes.
-E,-rsh = COMMAND specifies the shell program to replace rsh
-Rsync-path = PATH: Specifies the path of the rsync command on the remote server.
-C,-cvs-exclude automatically ignores files in the same way as CVS to exclude files that do not want to be transmitted
-Existing only updates the files that already exist in DST, instead of backing up the new files.
-Delete: delete the files that are not in the SRC file in DST.
-Delete-excluded: delete Files specified by this option at the receiving end.
-Delete after the transmission is completed
-Ignore-errors is deleted when an IO error occurs in a timely manner.
-Max-delete = NUM: a maximum of NUM files can be deleted.
-Partial retains the files that are not completely transferred for any reason to accelerate subsequent re-transmission.
-Force directory deletion, even if not empty
-Numeric-ids does not match the number user and group ID with the user name and group name.
-Timeout = time ip timeout TIME, in seconds
-I,-ignore-times do not skip files with the same time and length
-Size-only: when determining whether to back up a file, you only need to check the file size, regardless of the file time.
-Modify-window = NUM: Specifies the timestamp window used to determine whether the file time is the same. The default value is 0.
-T-temp-dir = DIR: create a temporary file in DIR
-Compare-dest = DIR: compare the files in DIR to determine whether to back up data.
-P is equivalent to-partial.
-Progress displays the backup process
-Z,-compress compresses backup files during transmission
-Exclude = PATTERN: Specifies the file mode that does not need to be transmitted.
-Include = PATTERN specifies the file mode to be transmitted without exclusion
-Exclude-from = FILE: exclude files in the specified mode in the FILE.
-Include-from = FILE: files with the specified FILE pattern matching are not excluded.
-Version: prints version information.
-Address: bind to a specific address
-Config = FILE: specify other configuration files. The default rsyncd. conf FILE is not used.
-Port = PORT specify other rsync service ports
-Blocking-io: block IO for remote shell
-Stats indicates the transmission status of some files.
-The actual transmission process of progress during transmission
-Log-format = formAT: Specify the log file format.
-Password-file = FILE: Obtain the password from the FILE.
-Bwlimit = KBPS limits I/O bandwidth, KBytes per second
-H,-help display help information

Instance analysis

Assume there are two servers: A and B. A is the primary web server with the domain name xucg. me (172.16.1.5), B is the backup machine, and its domain name is backup. xucg. me (172.16.1.6 ). The web content of A is stored in:/data/www/. We need to back up the contents of these directories on backup machine B.

Server Configuration instance

Create the rsyncd configuration file/etc/rsyncd. conf on xucg. me. The content is as follows:

Uid = nobody
Gid = nobody
Use chroot = no
Max connections = 4
Pid file =/var/run/rsyncd. pid
Lock file =/var/run/rsync. lock
Log file =/var/log/rsyncd. log

[Www]
Path =/data/www/
Ignore errors
Read only = true
List = false
Hosts allow = 172.16.1.0/24
Hosts deny = 0.0.0.0/32
Auth users = backup
Secrets file =/etc/backserver. pas

Here, only 172.16.1.0 can be used to back up local data, and authentication is required. The backup users authorized by the three modules are all backup users, and the user information is stored in the file/etc/backserver. pas. The content of the backup users is as follows:

Backup: back

The file can only be read and written by the root user. Otherwise, an error occurs when rsyncd is started. After these files are configured, you need to start the rsyncd server on server:

Rsync-daemon

Customer command example

/Usr/local/bin/rsync-vzrtopg-delete-progress backup@172.16.1.5: www/backup/www/-password-file =/etc/rsync. pass

In the preceding command line-In vzrtopg, v is verbose, z is compression, r is recursive, and topg is a parameter that maintains the original file attributes such as owner and time. -Progress indicates the detailed progress.-delete indicates that if the server deletes the file, the client also deletes the file to ensure true consistency.

Backup@172.16.1.5: www indicates that the command is to back up the www module in server 172.16.1.5, backup means to back up the module.

-Password-file =/etc/rsync. pass to specify the password file, so that the password can be used in the script without the need to enter the authentication password interactively. Note that the permission attribute of this password file must be set to only root readable.

The backup content is stored in the/backup/www/directory of the backup machine.

[Root @ linuxaid/] #/usr/local/bin/rsync-vzrtopg-delete-progress backup@172.16.1.5: www/backup/www/-password-file =/etc/rsync. pass
Refreshing ing file list... Done
./
1
785 (100%)
1. py
4086 (100%)
2. py
10680 (100%)
A
0 (100%)
Ip
3956 (100%)
./
Wrote 2900 bytes read 145499 bytes 576.34 bytes/sec
Total size is 2374927 speedup is 45.34

You can use the crontab-e command to implement automatic backup, such as crontab-e:

Some sample scripts
These scripts are examples on the rsync Website:

1. Incremental backup of data to the central server every seven days

#! /Bin/sh

# This script does personal backups to a rsync backup server. You will end up
# With a 7-day rotating incremental backup. The incrementals will go
# Into subdirectories named after the day of the week, and the current
# Full backup goes into a directory called "current"
# Tridge@linuxcare.com

# Directory to backup
BDIR =/home/$ USER

# Excludes file-this contains a wildcard pattern per line of files to exclude
EXCLUDES = $ HOME/cron/excludes

# The name of the backup machine
BSERVER = owl

# Your password on the backup server
Export RSYNC_PASSWORD = XXXXXX

######################################## ################################

BACKUPDIR = 'date + %'
OPTS = "-force-ignore-errors-delete-excluded-exclude-from = $ EXCLUDES
-Delete-backup-dir =/$ BACKUPDIR-"

Export PATH = $ PATH:/bin:/usr/local/bin

# The following line clears the last weeks incremental directory
[-D $ HOME/emptydir] | mkdir $ HOME/emptydir
Rsync-delete-a $ HOME/emptydir/$ BSERVER: $ USER/$ BACKUPDIR/
Rmdir $ HOME/emptydir

# Now the actual transfer
Rsync $ OPTS $ BDIR $ BSERVER: $ USER/current

2. Back up data to an idle Hard Disk

#! /Bin/sh

Export PATH =/usr/local/bin:/usr/bin:/bin

LIST = "rootfs usr data data2 ″

For d in $ LIST; do
Mount/backup/$ d
Rsync-ax-exclude fstab-delete/$ d // backup/$ d/
Umount/backup/$ d
Done

DAY = 'date "+ % "'

Rsync-a-delete/usr/local/apache/data2/backups/$ DAY
Rsync-a-delete/data/solid/data2/backups/$ DAY

3. image the cvs tree of vger.rutgers.edu

#! /Bin/bash

Cd/var/www/cvs/vger/
PATH =/usr/local/bin:/usr/freeware/bin:/usr/bin:/bin

RUN = 'lps x | grep rsync | grep-v grep | wc-l'
If ["$ RUN"-gt 0]; then
Echo already running
Exit 1
Fi

Rsync-az vger.rutgers.edu: cvs/CVSROOT/ChangeLog $ HOME/ChangeLog

Sum1 = 'sum $ HOME/ChangeLog'
Sum2 = 'sum/var/www/cvs/vger/CVSROOT/ChangeLog'

If ["$ sum1" = "$ sum2"]; then
Echo nothing to do
Exit 0
Fi

Rsync-az-delete-force vger.rutgers.edu: cvs // var/www/cvs/vger/
Exit 0

FAQ
Q: How can I perform rsync through ssh without entering a password?
A: follow these steps:

1. Use ssh-keygen to create an SSH keys on server A. do not specify A password ~ /. Ssh: the identity and identity. pub files are displayed.
2. Create a subdirectory. ssh in the home directory on server B.
3. Copy identity. pub of a to server B.
4. Add identity. pub ~ [User B]/. ssh/authorized_keys
5. So user A on server A can use the following command to ssh user B to server B.
E.g. ssh-l userB serverB
In this way, user A on server A can log on to server B as user B without A password.

Q: How can I use rsync through the firewall without compromising security?
A: The answer is as follows:

There are two common cases: one is that the server is inside the firewall and the other is outside the firewall. In either case, ssh is usually used. In this case, it is best to create a backup user and configure sshd to only allow this user to access through RSA Authentication. If the server is in the firewall, it is best to limit the IP address of the client and reject all other connections. If the client is in the firewall, you can simply allow the firewall to open the ssh outbound connection on TCP port 22.

Q: Can I back up the changed or deleted files?
A: Of course:

You can use rsync-other-options-backupdir =./backup-2000-2-13... This command is implemented.
In this case, if the source file is/path/to/some/file. c changed, so the old file will be moved. /backup-2000-2-13/path/to/some/file. c,
You need to manually create this directory.

Q: What ports do I need to open on the firewall to adapt to rsync?
A: depends on the situation.

Rsync can directly transfer files through tcp connection on port 873, or through ssh on port 22, but you can also change the port through the following command:

Rsync-port 8730 otherhost ::
Or
Rsync-e 'ssh-p 2002 'otherhost:

Q: How can I copy only the directory structure through rsync and ignore the file?
A: rsync-av-include '*/'-exclude '* 'source-dir dest-dir

Q: Why do I always see the "Read-only file system" error?
A: Check if you forget to set "read only = no ".

Q: Why do I encounter the '@ ERROR: invalid gid' ERROR?
A: During rsync, uid = nobody is used by default; gid = nobody is used for running. If your system does not have A nobody group, this error will occur, you can try gid = nogroup or another

Q: Why does port 873 fail to be bound?
A: If you do not run the daemon with the root permission, this error will occur because the ports below port 1024 are privileged ports. You can use the-port parameter to change the value.

Q: Why does my authentication fail?
A: From the Perspective of your command line:

You are using:
> Bash $ rsync-a 144.16.251.213: test
> Password:
> @ ERROR: auth failed on module test
>
> I dont understand this. Can somebody explain as to how to acomplish this.
> All suggestions are welcome.

There should be no issues caused by login with your username, try rsync-a max@144.16.251.213: test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.