Problem
Rsync is very useful for synchronizing data, especially for incremental synchronization. But there is a situation where it is not very useful to add a specific parameter. For example, you want to sync multiple files of dozens of G, then the network suddenly disconnects, you restart the incremental synchronization. But the discovery waited for a long time not to carry out data transmission, but the machine's IO has been high.
Reason
The specific incremental synchronization algorithm for rsync is not very clear. According to its performance, it may be in the incremental synchronization of a file that already exists, will verify that the transferred portion of the data is consistent with the source file, the verification is completed before continuing to incrementally synchronize the remaining data of this file. So if the incremental synchronization to a large file with such an algorithm is very time consuming and takes up the IO resources.
Method
Spent one hours in the middle of the night looking at the rsync documentation, there is a parameter that can quickly restore the large file's incremental sync, –append. Setting the –append parameter calculates the file size during an incremental synchronization and appends the new data directly to the file, saving the process of a fee IO check. However, this parameter is best used only when the source and target files are not changed to use more secure, such as backup files.
This article is from the Linux operation and maintenance log, please indicate the source and relevant links when reproduced.
Permanent link to this article: Https://www.centos.bz/2015/10/rsync-transfer-big-file-optimizer-trick
Rsync incremental transfer large file optimization tips