TAR+LZ4/PIGZ+SSH Faster data transfer

Source: Internet
Author: User
Tags ssh advantage

1. Conclusion

The maximum transmission performance can be achieved by using the Tar+lz4+ssh method:

The code is as follows Copy Code

Time Tar-c sendlog/|pv|lz4-b4|ssh-c arcfour128
-O "MACs umac-64@openssh.com" 10.xxx.xxx.36 "lz4-d |tar-xc/u01/backup_supu"
3.91GiB 0:00:16 [249mib/s]

Real 0m16.067s
User 0m15.553s
SYS 0m16.821s249mb/s,

Duly completed. is 6 times times the original SCP (40MB/S), the original 400GB transmission takes about 3 hours, now it only takes 27 minutes.

Note 1:LZ4 's outstanding performance in decompression has made him very important in this case. If you do not need to decompress the transport, you can consider using the PIGZ/PBIZ2

Note 2: Using PV observation, network traffic is about 80MB, so using NC to replace SSH does not have significant performance improvement

Note 3:LZ4 compression using-B4 (64KB block size), decompression using-b7 (4MB block size), is the test of the best value of this case
2. About LZ4
Lz4 is a "person to see people love, flowers see flowers" compression algorithm, can be very good in multi-core expansion, compression speed and compression ratio does not have much advantage (Pigz), but his decompression speed is very alarming, this case test LZ4 decompression is Gunzip 3 times times (more contrast test). Because of the efficient multi-core use of compression, coupled with stunning decompression, LZ4 has been used in a number of important occasions: the Linux3.11 kernel implements LZ4 and can use its compression and decompression kernel image hbase:add an LZ4 option To hfile and so on (reference).

LZ4 is ideal for scenarios that require frequent compression and instant decompression.

3. Performance Environment Description

Here we use the same two host environments as in the previous article: Ping gets the RTT is 17ms, and the Iperf test bandwidth is 115MB (reference appendix);

There are several stages of the process: disk read--> packaging (TAR)--> compression--> transmission--> Decompression--> unpacking--> the speed test corresponding to the disk:

3.1 Disk reads and drops
Disk reads (with page cache) to 3gb/s; Disk writes about 428MB:

The code is as follows Copy Code

# dd If=./sendlog.tar of=/dev/null bs=4096 count=1048576
1024002+1 Records in
1024002+1 Records out
4194314240 bytes (4.2 GB) copied, 1.33946 s, 3.1 GB/s

# dd If=/dev/zero of=./x.zero.file bs=4096 count=1048576
1048576+0 Records in
1048576+0 Records out
4294967296 bytes (4.3 GB) copied, 10.0306 s, 428 mb/s3.2

Package, unpacking
Both packaging and unpacking speeds are greater than 350mb/s:

The code is as follows Copy Code

# time TAR-CF Sendlog.tar./sendlog/
Real 0m10.996s
# time TAR-XF Sendlog.tar
Real 0m11.564s3.3

Compression and decompression
About the performance of each compression tool (compression, decompression, compression rate) has been a lot of people have done comparisons, this article does not do detailed discussion, here Select Gzip/pigz lz4 bzip do this test comparison:

  code is as follows copy code

            | Input Speed | Output Speed | rate   | Speed of decoder
Pigz-p | 327.0mb/s   | 57.2mb/s     | 17.5%  | 95  MB/s lz4        | 288.0mb/s   | 79.2mb/s     | 27.5%  | 264 MB/s
bzip2      |   4.9mb/s   | 0.65mb/s      | 13.1%  | 25.6MB/S Compression Tool comparison test reference: Gzip vs Bzip2 vs Lzma vs XZ vs LZ4 vs Lzo

As you can see, lz4 is slightly inferior to the compression rate (compared to pigz), but it has this amazing advantage in the speed of decompression.

3.4 Transmission

The preceding article introduces the SCP, about 90MB the fastest transmission speed.

3.5 Overall flow

The code is as follows Copy Code
Disk read----> Packaging----> Compression------> Transmission----> Decompression--> unpacking----> Drop plate
|->tar |->gzip |->ssh |->gzip |->tar
|->BZIP2 |->http |->bzip
|-> ... | ->NC |->
|->lz4 |->lz4
>400mb/s >350mb/s 79mb/s 90mb/s 72mb/s >350mb/s

Here you can see that decompression is the biggest bottleneck, using the most advantageous compression tool in the decompression, can make the transmission to achieve maximum speed. And LZ4 is in the decompression efficiency has a huge advantage.

According to the above LZ4 test, the transmission speed theory value is 264mb/s (at this time the transmission speed is 264*27.3%=72MB), this is also the theoretical upper limit speed of this test.

4. Experimental test
using LZ4 compression transport:

The code is as follows Copy Code

# time Tar-c sendlog/|lz4|ssh-c arcfour128
-O "MACs umac-64@openssh.com" 10.xxx.xx.36 "lz4-d |tar-xc/u01/backup_supu"
Real 0m25.646s
Real 0m25.911s
Real 0m29.019s

Test three times, respectively, time consuming 26s, 29s, 25.6s, the average speed of transmission is: 152mb/s, network bandwidth occupies about 41.9mb/s.

Compressed transport using Pigz:

The code is as follows Copy Code

# time Tar-c sendlog/|pigz-p 16|ssh-c arcfour128
-O "MACs umac-64@openssh.com" 10.xxx.xx.36 "Gzip-d|tar-xc/u01/backup_supu"
Rreal 0m37.030s
Real 0m25.911s
Real 0m29.019s

Test three times, respectively, time consuming 37s, 37.2s, 35.6s, the average speed of transmission is: 110.7mb/s, network bandwidth occupies about 19.4mb/s.

Contrast found that in the compression of Pigz and LZ4 is not much different, but lz4 decompression is very fast, so in this need to extract the scene immediately, Lz4 easily win (bzip2 this does not need to test).

4.1 Analysis
According to the theoretical analysis in the second section, the transmission speed should be able to 260MB, but the above only 152mb/s, which shows that there is room for tuning. Continue analysis to see where bottlenecks are:

Using the PV tool, TAR+LZ4 has approximately 70mb/s output:

Time Tar-c SENDLOG/|LZ4|PV >/dev/null
1.02GiB 0:00:14 [70.8mib/s] [<=>] is about 10% slower than the direct lz4 output (LZ about 79mb/s).

Plus one more network ssh:

Time tar-c sendlog/|lz4|pv|ssh-c arcfour128-o "MACs umac-64@openssh.com" 10.xxx.xxx.36 "Cat->/dev/null"
1.02GiB 0:00:23 [43.9mib/s] [<=>]

than direct LZ4 output, to slow about 45% (LZ about 79mb/s), and the remote and decompression and unpacking, compressed transmission speed is 41.9mb/s. Why the decline, it is not clear, the author has not thought of any method can directly accelerate the transmission of such pipelines, if the spectators have any suggestions, may wish to share, see whether it can be optimized, continue to improve speed.

At this point, the transmission speed can be to 150mb/s. It's about 4 times times faster than the original SCP (40MB/S), which turns out 400GB takes about 3 hours, and now it only takes 45 minutes.

5. LZ4 parameter Test

The previous experiment found that the entire process of LZ4 compression than expected to be 45% or so, and here the difference is only a use of pipelines (pipe), a direct read. Here we try to improve the performance by modifying the LZ4 block size comparison:

Test command:

The code is as follows Copy Code

For i in ' SEQ 4 7 '; Do time tar-c./sendlog/|lz4-b$i |PV >/dev/null;d One
1.07GiB 0:00:11 [94.4mib/s] [<=>]
Real 0m11.640s
User 0m10.375s
SYS 0m4.308s

Can see the block size of 64KB, LZ compression speed has significantly increased (31%). So we're lz4 new parameter-b4 to see if we can improve performance:

Bang! Indeed, transmission performance has been elevated to about 249MB/S:

The code is as follows Copy Code

Time Tar-c sendlog/|pv|lz4-b4|ssh-c arcfour128
-O "MACs umac-64@openssh.com" 10.xxx.xxx.36 "lz4-d |tar-xc/u01/backup_supu"
3.91GiB 0:00:16 [249mib/s]

Real 0m16.067s
User 0m15.553s
SYS 0M16.821S5.

Why not use NC
You don't need it!!!

* NC is no faster than SSH; if compressed, NC has no advantage over SSH

* NC is not easy to invoke in script, need to execute command at both ends

* NC requires an additional network port

* NC Not encrypted

6. Can it be faster
In this case, the speed of lz4 decompression is 264mb/s, here can reach 249mb/s, there should be a little bit can squeeze, but I have no recruit.

Appendix
Iperf Bandwidth test:

The code is as follows Copy Code

Iperf-c 10.xxx.xx.18-p 3999-t 30
------------------------------------------------------------
Client connecting to 10.xxx.xx.18, TCP Port 3999
TCP window size:16.0 KByte (default)
------------------------------------------------------------
[3] Local 10.xx.xx.36 port 43838 connected with 10.xx.xx.18 Port 3999
[ID] Interval Transfer Bandwidth
[3] 0.0-30.0 sec 3.15 gbytes 903 mbits/sec

Iperf-s-P 3999-m
------------------------------------------------------------
Server listening on TCP port 3999
TCP window size:85.3 KByte (default)
------------------------------------------------------------
[& nbsp 4] Local 10.xx.xx.18 port 3999 connected with 10.xx.xx.36 port 43838
[ID] interval       ; transfer     Bandwidth
[  4]  0.0-30.0 sec  3.15 gbytes   902 Mbits/ SEC
[  4] MSS size 1448 bytes (MTU 1500 bytes, Ethernet)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.