A brief introduction to Rsync and its algorithms

Source: Internet
Author: User
Tags symlink syslog system log terminates reverse dns rsync

Now the storage system, itself has a strong migration and backup strategy, although still based on the network transmission, there is a relative delay, but a lot of convenience. In addition, the current storage system, read-write bottlenecks, and most of the problem is changed to object storage.

When we do file storage, the most headache is data migration and offsite backup. The best sync Tool for <ku,bi> is rsync.


"Basic Introduction"

Rsync:rsync-a Fast, versatile, remote (and local) file-copying tool. Fast, universal, remote (and local) file Copy tool.

Notable features:

1, fast-based on a unique "rsync" algorithm. Implement incremental synchronization.

2, simple--based on open source protocol, easy to install and configure. And simple synchronization in a specific scenario requires only the server configuration. The client is based on the Linux distribution itself.

3, convenient-powerful parameter support, can be in the synchronization process, the implementation of the custom to preserve symbolic links, hard links, permissions, time, attribution and other information.

4, security--configuration file can complete password access, read and write permissions, IP control and other security restrictions.

5, complete--is still based on the "rsync algorithm" to achieve file synchronization data integrity, and Support directory synchronization, compression synchronization and recursive copy common usage scenarios.

Rsync turns on port 873 by default and provides the files via the native Rsync transfer protocol or through a remote shell.


"Installation and Configuration"

Get the source package: #wget--no-check-certificate https://download.samba.org/pub/rsync/src/rsync-3.1.3.tar.gz

Installation and Deployment:

# tar ZXVF rsync-3.1.3.tar.gz && CD rsync-3.1.3

#./configure

# Make && make install

# touch/etc/rsyncd.conf #默认没有配置文件, you need to create it yourself.


Description of relevant documents:

Binary files:/usr/local/bin/rsync

Configuration file:/etc/rsyncd.conf


Description of the configuration file: Official document ==https://download.samba.org/pub/rsync/rsyncd.conf.html

The official documentation outlines:

Description

The rsync configuration file controls authentication, access, logging, and controllable modules.

[File format]

The rsync configuration file consists of modules and parameters, one module starting with the module name in square brackets and continuing until the next module begins. The parameter type in the module is name = value.

The rsync configuration file is row-based, that is, each new line terminates with a line that represents a comment, a module name, or a parameter.

In parameters, only the first equals sign is important, and spaces before or after the first equals sign are discarded. In module and parameter names, quotation marks, trailing and inner spaces are irrelevant. The internal whitespace within the parameter values is reserved verbatim.

Any line beginning with # is ignored as a space (if a # occurs after anything other than the beginning of the white space, it is considered part of the line's content).

Any line ending with \ continues in the usual uninx manner.

parameter, the value after the equal sign is either a string (no quotation marks), or a Boolean value. It can be YES/NO,0/1 or true/false.


"Parameter description"

[Global parameters]

The first parameter in the file (before the [module] header) is a global parameter, rsync also allows the [global] module name to represent the beginning of one or more global parameter parts (the name must be lowercase).

Rsync also supports the inclusion of parameters for any module in the global section of the configuration file, in which case the supplied value overrides the default value of the parameter. 、

Rsync also supports parameter values referencing environment variables. The safest way to insert text% into a value is to use percent.


MOTD file: This parameter allows you to specify a "day message" to display to the client on each connection. There is no MOT file by default and can be overridden by the-dparam=motdfile= file command-line option when the daemon is started.

PID file: This parameter tells the rsync daemon to write the process ID to the file, and if the file already exists, the daemon terminates instead of overwriting the file. When the daemon is started, it can be overridden by the-dparam=pidfile= file command-line option.

PORT: This parameter tells rsync the corresponding listening port, which is 873 by default. If this daemon is run by inetd and is replaced by the--prot command-line option, it will be ignored.

Address: This parameter tells rsync the corresponding listener IP address, if the daemon is run by inetd and is replaced by the--address command-line option, it will be ignored.

Socket options: This parameter allows the system interface to be tuned to the maximum extent possible. You can set a variety of socket options to make the transfer faster (or slower) to read the man page of the setsockopt () system to learn more about some options that might be set. By default, special socket options are not set.

Listen backlog: You can override the default backlog value when the daemon listens for connections. It defaults to 5.

[Module parameters]

module parameters Follow the underlying information for global parameters, such as supporting environment variables.

The module name in Rsync cannot contain a slash or square brackets, and if it contains spaces, each internal sequence is changed to a single space.


Comment: This parameter specifies the description string that appears next to the module name when the client obtains a list of available modules. Default None.

Path: This parameter specifies the directory in the daemon file system so that it is available in this module and must be specified for each module. By enclosing the variable name with a percent sign, you can get the value of the path out of the environment variable and even reference the variables set by rsync when the user connects. If the final directory has a trailing space, append a slash behind the path to avoid loss.

Use chroot: If true, the rsync daemon is chroot according to the path before the file transfer begins on the client. The advantage of having extra protection requires superuser privileges. Default None.

Daemon Chroot: Ditto for the entire daemon and requires the installation of OS/LIB/ETC files to work. Default None.

Numeric IDs: Enabling this parameter disables the name Mapping users and groups for the current daemon module. Default None.

Munge symlinks: This parameter tells Rsync to modify all symbolic links. Default None.

CharSet: Specifies the character set name of the module file name. If the client uses the--ICONV option, the daemon uses the value of the "charset" parameter, regardless of what character set the client actually passes. Default None.

Max connections: This parameter allows you to specify the maximum number of concurrent allowed. The default is 0, which means there is no limit.

Log file: When this parameter is set to a non-empty string, the rsync daemon logs the message to the specified file instead of using Syslog. If the daemon cannot open the specified file, it will fall back to using the syslog and output the error about the failure.

Syslog Facility: This parameter allows you to specify the name of the System log tool to use when logging messages from the rsync daemon.

Syslog tag: This parameter allows you to specify the Syslog token to use when logging messages from the rsync daemon. The default value is RSYNCD.

Max vervbosity: This parameter allows you to control the maximum number of detailed fine information generated by the daemon. The default is 1, which also affects the ability of users to request a higher level of--info and--debug logging.

Lock file: This parameter is used for files that support the maximum number of connections parameter. The RSYNFC daemon uses record locking on this file to ensure that the maximum number of connections to the shared lock file module is not exceeded. Default is/var/run/rsyncd.lock

Read only: This parameter determines whether the client can upload a file, and if "Read Only" is true, any attempt to upload will fail. If read-only is false, it can be uploaded if the file permission is allowed. By default, all modules are read-only. Auth users can override this option on a per-user basis.

Write only: This parameter determines whether the client can download the file, and if "write only" is true, any attempt to download will fail. If "Write only" is false, then if the file permission is operational, it can be downloaded. By default, this parameter is disabled.

List: This parameter determines whether the module is listed when the customer asks for a list of available modules. By default, modules can be listed.

UID: This parameter specifies that the user name or user ID of the module is passed in and out when the daemon is running as root. When the superuser runs, the default value is to switch to the system's "Nobody" user, and the default setting for non-superuser is to not attempt to change the user. the RSYNC_USER_NAME environment variable can be used to request that RSYNC be run as an authorized user.

GID: This parameter specifies one or more group names to be used when accessing the module/ID, the first is the default group, and the others are set to not sufficient. You can also specify a * as the first GID in the list.

Daemon UID: This parameter specifies which uid the daemon will run under, typically runs as root and keeps the user intact when not set.

Daemon GID: This parameter specifies which gid the daemon will run under, typically run at the root of the group, and the group remains unchanged when not set.

Fake super: The same as the Daemon--fake-super command-line option, which allows you to store the full properties of a file without having to actually run the daemon as root.

Filter: Filters option that determines which files are allowed to be accessed by the client. The chain is not sent to the client. Constructed from filters, satiety, include, exclude, and exclude parameters in order of precedence.

Exclude: This parameter takes a space split daemon exclusion mode list as a client-exclusion option.

Include: Use "include" to override the effect of the "exclude" parameter.

Exclude From/include from: This parameter specifies the name of the file on the daemon that contains the daemon exclusion/include mode, one per line.

inconming chmod: This parameter allows you to specify a comma-delimited set of chmod strings that will affect the permissions of all incoming files. These are more expensive to occur after all other permissions are computed, and will even overwrite the target default/existing permissions when the client does not specify--perms.

Outgoing chmod: This parameter allows you to specify a comma-delimited set of chmod strings that will affect the permissions of all outgoing files. These changes occur first, making the sent permissions appear to be different from the permissions stored in the file system itself.

Auth Users: This parameter specifies a comma and slash or space split list of authorization rules that lists the user names that will be allowed to be linked to the module in the simplest form. The user name does not need to exist on the local system, and the rule may contain a shell wildcard character. Support @,: And so on, if necessary, refer to the help documentation in detail.

Secrets file: This parameter specifies the filename that contains the User:password used to validate this module, which is only consulted when the ' auth users ' parameter is specified, is row-based, and contains a name: a password pair for each row. The password can contain any character but is warned to limit the length of the password that can be entered by the client.

Strict modes: This parameter determines whether permissions for the password file are checked. If true, then the password file cannot be read by any user other than the one that runs the rsync daemon. If False, no check is performed. The default is true.

Hosts allow: This parameter allows you to specify a comma-or space-delimited list of patterns that match the host name and IP address of the connecting client. If there is no pattern match, the connection is rejected.

Hosts deny: This parameter allows you to specify a comma-or space-delimited list of patterns that match the host name and IP address of the connected client. If the pattern matches, the connection is rejected.

Reverse Lookup: Reverse lookup, enabled by default. Controls whether the daemon performs a reverse lookup of the client's IP address to determine its host name.

Forward lookup: Forward lookup, enabled by default. Reverse DNS that is allowed to use a connection IP does not return an explicit host name.

Ignore errors: When deciding whether to run the delete phase of a transfer, this parameter instructs Rsync to ignore I/O errors on the daemon. Under normal circumstances, any I/O error occurs, rsync skips the--delete step to prevent catastrophic deletion due to temporary resource shortages or other I/O errors.

Ignore nonreadable: This parameter high number daemon completely ignores the user's unreadable files.

Transfer logging: This parameter makes the log record of each file downloaded and uploaded in the format, always at the last recorded transfer, and if the transmission is terminated, the log file is no longer logged.

Log format: This parameter allows you to specify the format that is used to log file transfers when transport logging is enabled. The format is a text string that contains an inline single-character dress sequence prefixed with the% character. Default format: "%o%h[%a]%m (%u)%f%l"

  • %a Remote IP address (for daemons only)

  • %b number of bytes actually transferred

  • Permission bits for%b files (for example, RWXRWXRWT)

  • The total size of block checksums received by%c for the underlying file (only when sent)

  • %c is a full file checksum if the file is known. for older rsync protocols/versions, checksums are pickled and therefore not useful values (and in this case not shown). for the checksum of the file output, either the --checksum option must be valid, or the file must have been transmitted without the use of a salting checksum. See the --checksum-choice option to select an algorithm.

  • %f file name (sender's long form; no ending "/")

  • GID (decimal) or "DEFAULT" for the%g file

  • %H Remote host name (for Daemons only)

  • %i The list of items I am updating

  • Length of the%l file (in bytes)

  • %l string "-> SYMLINK", "= = Hardlink" or "" (where SYMLINK or Hardlink is the file name)

  • %M Module Name

  • Last modified time of the%m file

  • %n denotes file name (abbreviated form; trailing "/" means dir)

  • The% of the operation, which is "send", "Receive" or "delete". (latter including late)

  • %p The process ID of this rsync session

  • %p Module Path

  • %t Current Date Time

  • %u authenticated user name or empty string

  • %u file uid (decimal)

Timeout: This parameter allows the client to override the I/O supermarket of this module to ensure that rsync does not wait for the client forever. The timeout, in seconds, defaults to 0, indicating no time-out. Anonymous rsync daemon recommended 600s.

Refuse options: This parameter allows you to specify a space-splitting list of rsync command-line options that will be gathered by the rsync daemon.

Dont compress: This parameter allows you to select a file name based on a wildcard pattern, and is not compressed during transmission.


Example configuration file:

a Simple rsyncd.conf file, anonymous rsync to A ftp area at , /home/ftp  would be:

[ftp]        path = /home/ftp         comment = ftp export area 

a More sophisticated example would be:

uid = nobodygid = nobodyuse chroot = yessyslog facility =  Local5pid file = /var/run/rsyncd.pid[ftp]        path  = /var/ftp/./pub        comment = whole ftp  area  (APPROX&NBSP;6.1&NBSP;GB) [sambaftp]        path =  /var/ftp/./pub/samba        comment = samba ftp  area  (APPROX&NBSP;300&NBSP;MB) [rsyncftp]        path =  /var/ftp/./pub/rsync        comment = rsync ftp  area  (APPROX&NBSP;6&NBSP;MB) [sambawww]        path =  /public_html/samba        comment = samba www  pages  (approx&NBSP;240&NBSP;MB) [cvs]        path = /data/cvs         comment = CVS repository  (requires authentication )         auth users = tridge, susan         secrets file = /etc/rsyncd.secrets

the/etc/rsyncd.secrets file would look something like this:

tridge:mypass
susan:herpass


"Rsync algorithm parsing"

1, the algorithm was invented by Andrew Tridgell. He is also the original author of Rzip/knightcap.

2, the overall algorithm, please refer to the cool Shell Chenhao predecessors of the detailed < below have links >.

3, according to the above detailed, finishing as follows:

A, chunked checksum algorithm: Used to divide the target file into chunks, and then calculate two checksum< per block a weak: for different differences. One strong: Used to confirm the same >.

B, transfer algorithm: Synchronization target side, will be the target file checksum list < strong, weak, file block number > to the synchronization source.

C,checksum Lookup algorithm: The synchronization source to get the target file of the checksum array, the data exists in the hash table, with weak to do hash.

D, Contrast algorithm:

1, take the source file of the first file block, do weak calculation, calculate the good value to hash table to find.

2, if found, the description finds in the target file and potentially the same file block. Compare strong calculations again. If the weak and strong are the same, mark this piece of file number in the target file.

3, if not to calculate the strong, the algorithm back SETP 1 bytes, take the source file of the second block to do weak calculation.

4, according to the above, find out the source file adjacent to two matches in the text characters, is the need to synchronize the contents of the file.


"Common Reference"

Not currently


"Reference file"

Wikipedia: Https://zh.wikipedia.org/wiki/Rsync

Rsync Official website: https://www.samba.org/ftp/rsync/rsync.html

Rsync configuration file Document: https://download.samba.org/pub/rsync/rsyncd.conf.html

Algorithm analysis of cool shell: https://coolshell.cn/articles/7425.html

Andrew Tridgell wiki: Https://en.wikipedia.org/wiki/Andrew_Tridgell


A brief introduction to Rsync and its algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.