Comparison of bzip and pbzip2 compression tools
Linux: Debian8.5
Pbzip2 installation:apt-get install pbzip2
For details about pbzip2, see File compression tool pbzip2 installation and usage.
Bzip2 (single-thread compression tool)
# Compressing a single file for testing # size of a single file root @ wing:/data # du-h 2016. sql3.4G 6. SQL # tar bzip2 compression Command time tar-jcf 2016. SQL .bz2 2016. SQL # the compression time of a single file is real 10m7. 996 suser 10m4. 632 ssys 0m13. 276 s # compressed file size root @ wing:/data # du-sh 2016. SQL .bz2 220 M 2016. SQL .bz2 # compressed directory test # directory file size root @ wing: /data # du-sh 20161122/6. 9G 20161122/# tar bzip can only use one core for compression time tar-jcvf 20161122_bzip.bz2 20161122/* # directory compression time real 24m30. 013 suser 22m51. 936 ssys 0m23. 872 s # size of the compressed file root @ wing:/data # du-h 20163162.bz2 356 M 20163792.bz2
Pbzip2 (multi-thread compression tool)
# Compressing a single file for testing # size of a single file root @ wing:/data # du-h 2016. sql3.4G 6. SQL # pbzip2 compression Command time pbzip2-p3-k 2016. SQL # compression time of a single file real 3m22. 909 suser 9m55. 092 ssys 0m16. 284 s # compressed file size root @ wing:/data # du-sh 2016.pbzip.bz2 221 M 2016.pbzip.bz2 # compressed directory test # directory file size root @ wing: /data # du-sh 20161122/6. 9G 20161122/# tar bzip pbzip uses three cores for compression time tar-c 20161122 | pbzip2-p3-c> 20164242.tar.bz2 # directory compression time real 7m31. 688 suser 22m5. 736 ssys 0m42. 520 s # size of the compressed file root @ wing:/data # du-h 20162.162.tar.bz2 358 M 20162.162.tar.bz2
Summary:
|
Bzip |
Pbzip (3 threads) |
Original file size |
3.4 GB |
3.4 GB |
File compression time (real) |
10m7. 996 s |
3m22. 909 s |
File compression size |
220 M |
221 M |
Original directory size |
6.9 GB |
6.9 GB |
Directory compression time (real) |
24m30. 013 s |
7m31. 688 s |
Directory compression size |
356 M |
358 M |
Note: The reason why the compression time is calculated using real instead of user + sys is that the user time in multiple threads is the sum of the time in each thread, the difference between the time and the time we can perceive is large. Therefore, when we select real, the server is always initialized, so real is closer to the user's perceived time.
From the above table, we can see that, when pbzip2 enables three threads to compress a single file or a compressed directory, the compression time is nearly three times faster than that of a single thread bzip2, the compression ratio is also basically the same.