Efficient large file copying

Source: Internet
Author: User

When you want to copy large files between two machines, combining nc (netcat) and pigz (parallel gzip) is a simple and efficient choice. However, if you want to distribute these files to multiple machines at the same time, how can this problem be solved? In Tumblr, this is quite a common requirement, for example, when we want to quickly add several MySQL Slave servers at the same time.

You can copy data from the source machine to the target machine one by one, but the time is often doubled. Alternatively, you can copy data from the source machine to multiple target machines at the same time. However, due to factors such as the bandwidth of the source machine, the speed is not necessarily fast.

Fortunately, you can do better with some UNIX tools. The combination of tee and FIFO can form a fast file distribution chain: Each machine in the distribution chain stores files and distributes them to the next link.

First, select a target machine as the last part of the distribution chain. On this machine, you only need to use nc listening (assuming the port is 1234), and then decompress it by pigz through the pipeline, the pipeline is used to submit the data to tar for decomposition.

  1. Nc-l 1234 | pigz-d | tar xvf-

Then, go up from the end of the distribution chain and set other target machines. It also needs to be monitored, decompressed, and decomposed, however, before decompression, we use the tee command to output the data to the named pipe (FIFO). Another shell pipe will distribute the unzipped data to the next link of the distribution chain at the same time:

  1. Mkfifo myfifo
  2. Nc hostname_of_next_box 1234 nc-l 1234 | tee myfifo | pigz-d | tar xvf-

Finally, start the distribution chain on the source machine to transfer data to the distribution chain:

  1. Tar cv some_files | pigz | nc hostname_of_first_box 1234

In my tests, each machine in the distribution chain may lose 3%-10% of the performance (compared to 1-to-1 copy ), however, the efficiency is significantly improved by copying one machine one by one or distributing it to multiple machines at the same time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.