Efficient Data Synchronization Method and Efficiency Test-20150105 decompression while compressing while transmitting while packing
Sometimes it takes a long time to back up or synchronize large directories with many files (such as several GB or dozens of GB database directories and log Directories, although we can adopt the method of first compressing, then transmitting, and then decompression, the amount of data transmitted is indeed reduced, but the compression and decompression will take a lot of time, and the overall effect is not satisfactory, the whimsy of last night, due to previous experience in streaming media and video-on-demand projects, if you can download and play video files in the same way as watching HD videos, you only need to download the metadata header of the video file, that is, progressive download (http://baike.baidu.com/link? Url = fTWQYBTqQr1BisysCAkoqIytbwotfBYvFEMxEAlspRbNmE6b5lwVLNzA-qgw6yGlFgBepYBzqvUEb2tqQaehBK), that is perfect, today on the Internet a search linux really OK, excited to do a comparison test:
Conclusion:
(1) In general, for text files, compression is more efficient than non-compression transmission, but the effect is not obvious (because the bottleneck is not transmitted over the network, but compression, see the following comparison of test 1, 2, 3, and 4 );
(2) If the streaming transmission mode is packaged while compressing while transmitting and extracting, the transmission efficiency is improved by 35% compared with the direct scp/rsync method;
(3) specific to the stream transmission ssh and nc methods, because the nc does not require user verification, and does not require encrypted transmission of data, the efficiency is slightly higher, the comparison effect is not obvious (because the bottleneck is not in network transmission, but in compression );
(4) ssh is preferred in actual use, because the push or pull method can be used and one command can be used. The same source can have multiple concurrency, nc needs to listen to the port on the receiver first, and then start transmission on the sender. Two commands need to be executed separately. Worry: if a third party sends data to the listening port of the receiving end at the same time, it may cause data integrity, however, the actual test shows that the nc receiver can only establish a connection with one sender for data transmission. If data is being transmitted, the data sent by the third party to the listener port will not be transmitted, only after the new listening port or after the transmission is completed, re-enable the modified port for transmission. In short, it is still inclined to use ssh.
Test environment: centos5.5 Gigabit LAN
Test directory/var/log size 8.9 GB
[Root @ cap131 ~] # Du-h/var/log/
28 K/var/log/prelink
8.0 K/var/log/conman. old
8.0 K/var/log/vbox
24 K/var/log/cups
50 M/var/log/redis
76 K/var/log/nginx
6.1 M/var/log/sa
8.0 K/var/log/conman
8.0 K/var/log/ppp
18 M/var/log/audit
152 K/var/log/php-fpm
8.8 GB/var/log/rabbitmq
12 K/var/log/pm
16 K/var/log/mail
8.9 GB/var/log/
[Root @ cap131 ~] #
1. Time of direct scp copy (5'20 ''):
[Root @ cap131 ~] # Time scp-r/var/log/192.168.1.130:/root/test-dir/
Real 5m20. 834 s
User 3m29. 049 s
Sys 0m41. 038 s
2. Package, compress, transfer, and decompress the package (3 '33 ''+ 14'' + 1 '19'' = 5 '6 ''):
Time of pure compression:
[Root @ cap131 ~] # Time tar czf varlog.tar.gz/var/log
Tar: Removing leading '/' from member names
Real 3m33. 740 s
User 3m28. 068 s
Sys 0m19. 081 s
Size after PURE compression:
[Root @ cap130 test-dir] # du-h ../varlog.tar.gz
399 M ../varlog.tar.gz
Time of transfer-only compressed package:
[Root @ cap131 ~] # Time scp varlog.tar.gz 192.168.1.130 :~
Root@192.168.1.130's password:
Varlog.tar.gz 100% 399 MB 30.7 MB/s
Real 0m14. 024 s
User 0m9. 510 s
Sys 0m1. 283 s
Extract time
[Root @ cap131 ~] # Time tar xzf varlog.tar.gz
Real 1m19. 916 s
User 0m49. 498 s
Sys 0m35. 588 s
3. The direct rysnc does not enable the compression function for the transmission time (5'12 ''):
[Root @ cap131 ~] # Rsync-r/var/log/192.168.1.130:/root/test-dir
Rsync error: pinned ed SIGINT, SIGTERM, or SIGHUP (code 20) at rsync. c (260) [sender = 2.6.8]
[Root @ cap131 ~] # Time rsync-r/var/log/192.168.1.130:/root/test-dir
Root@192.168.1.130's password:
Real 5 M12. 625 s
User 3m55. 503 s
Sys 0m34. 568 s
4. Enable the compression function for rsync (4'36 ''):
[Root @ cap131 ~] # Time rsync-zr/var/log/192.168.1.130:/root/test-dir
Real 4m35. 991 s
User 4m40. 208 s
Sys 0m5. 306 s
5. Package and compress while transmitting and decompressing time (using the push method for remote command execution via ssh ):
[Root @ cap131 ~] # Time tar czf-/var/log | ssh 192.168.1.130 tar xzf--C/root/test-dir/
Tar: Removing leading '/' from member names
Real 3m33. 711 s
User 3m37. 066 s
Sys 0m22. 210 s
The time for compressing and transmitting while extracting while packing (using the pull method for remote command execution via ssh ):
[Root @ cap130 test-dir] # time ssh 192.168.1.131 tar czf-/var/log | tar xzf--C/root/test-dir/
Tar: Removing leading '/' from member names
Real 3m33. 772 s
User 1 M13. 207 s
Sys 0m55. 302 s
6. Time for compressing and transmitting while packing (using nc push ):
Receiver listening port 10086:
[Root @ cap130 test-dir] # nc-l 10086 | tar xzf--C/root/test-dir/
Start transmission at the sending end:
[Root @ cap131 ~] # Time tar czf-/var/log | nc 192.168.1.13010086
Tar: Removing leading '/' from member names
Real 3m31. 218 s
User 3m27. 908 s
Sys 0m15. 839 s
Time for compressing and transmitting while packing (using nc pull ):
This method does not seem to work!
EOF