Linux-archiving and Compression

Last Update:2015-09-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

File Archiving is used if one or more files need to be transmitted or stored as efficiently as possible. There is aspects to this:

Archiving –combining multiple files into one, which eliminates the overhead in individual files and makes it easier to Transmi T
compressing –making The files smaller by removing redundant information

Even though disk space is relatively cheap, archiving and compression still have value:

If you want to make a large number of files available, such as the source code to an application or a collection of Docume NTS, it's easier for people to download a compressed archive than it's to download individual files.
Log files has a habit of filling disks so it's helpful to split them by date and compress older versions.
When you are back up directories, it's easier to keep them all in one archive than it's to version each file.
Some streaming devices such as tapes perform better if you ' re sending a stream of data rather than individual files.
It can often is faster to compress a file before you send it to a tape drive or over a slower network and decompress it on The other end than it would is to send it uncompressed.

Part 1:compressing Files

CompressingFiles makes them smaller by removing duplication from a file and storing it such this file can be restored. A file with human readable text might has frequently used words replaced by something smaller, or an image with a solid b Ackground might represent patches of that color by a code. You generally don ' t use the compressed version of the file, instead you decompress it before use. compression algorithm The is a procedure the computer does to encode the original file, and as a result make it smaller. Computer scientists these algorithms and come up with better ones that can work faster or make the input file SMA Ller.

When talking about compression, there is and types:

Lossless : No information is removed from the file. Compressing a file and decompressing it leaves something identical to the original.
Lossy : Information might be removed from the file as it's compressed so that uncompressing a file wi ll result in a file which is slightly different than the original. For instance, a image with subtly different shades of green might is made smaller by treating those both shades as the Same. Often, the eye can ' t pick out the difference anyway.

Generally human eyes and ears don ' t notice slight imperfections in pictures and audio, especially as they is displayed on A monitor or played over speakers. Lossy compression often benefits media because it results in smaller file sizes and people can ' t tell the difference Betwe En the original and the version with the changed data. For things this must remain intact, such as documents, logs, and software, you need lossless compression.

Most image formats, such as GIF, PNG, and JPEG, implement some kind of lossy compression. Can generally decide how much quality you want to preserve. A lower quality results in a smaller file, but after decompression you may notice artifacts such as rough edges or Discolo Rations. High quality would look much like the original image, but the file size would be closer to the original.

Compressing an already compressed file won't make it smaller. This was often forgotten when it comes to images, since they was already stored in a compressed format. With lossless compression, this multiple compression are not a problem, but if you compress and decompress a file several T IMEs using a lossy algorithm you'll eventually has something that's unrecognizable.

Linux provides several tools to compress files, the most common is gzip . Here we show a log file before and after compression.

bob:tmp $ ls-l access_log*-rw-r--r--1 Sean Sean 372063 Oct one 21:24 access_logbob:tmp $ gzip Access_lo Gbob:tmp $ ls-l access_log*-rw-r--r--1 Sean Sean 26080 Oct one 21:24 access_log.gz

In the example above, there is a file called "Access_log" which is 372,063 bytes. The file is compressed by invoking the command with the name of the file as the only gzip argument. After ' command completes, the original file is gone and a compressed version with a file extension of. GZ are left in I TS Place. The file size is now 26,080 bytes, giving a compression ratio of about 14:1, which are common with log files.

Gzip would give you information if you ask, by using –l the parameter, as shown here:

bob:tmp $ gzip-l access_log.gz      compressed     uncompressed  ratio uncompressed_name           26080           372063  93.0% access_log

Here, you can see that the compression ratio are given as 93%, which is the inverse of the 14:1 ratio, i.e. 13/14. Additionally, when the file is decompressed it'll be called Access_log.

bob:tmp $ gunzip access_log.gzbob:tmp $ ls-l access_log*-rw-r--r--1 Sean Sean 372063 Oct one 21:24 acce Ss_log

The opposite of the gzip command is gunzip . Alternatively, gzip –d does the same thing (Gunzip is just a script that calls gzip with the right parameters). After Gunzip does It work you can see that the Access_log file was back to its original size.

Gzip can also act as a filter which means it doesn ' t read or write anything to disk but instead receives data through an I Nput Channel and writes it out to an output channel. You'll learn more about how this works on the next chapter, so the next example just gives you a idea of what can do By being able to compress a stream.

bob:tmp $ mysqldump-a | gzip > database_backup.gzbob:tmp $ gzip-l database_backup.gz         compressed uncompressed ratio  Uncompressed_name              76866             1028003  92.5% database_backup

The mysqldump–a command outputs the contents of the local MySQL databases to the console. The | Character (pipe) says "redirect the output of the previous command into the input of the next one". The program to receive the output was gzip, which recognizes that no filenames were given so it should operate in pipe mode . Finally, The > database_backup.gz means "redirect the output of the previous command into a file called Database_backup. Gz. Inspecting this file with Gzip–l shows the compressed version is 7.5% of the size of the original, with the added being Nefit the larger file never had to is written to disk.

There is another pair of commands, operate virtually identically to gzip and gunzip. These is bzip2 and BUNZIP2. The bZIP utilities use a different compression algorithm (called Burrows-wheeler block sorting, versus Lempel-ziv coding u SED by gzip) This can compress files smaller than gzip at the expense of more CPU time. You can recognize these files because they has a. bz or bz2 extension instead of. Gz.

Part 2:archiving Files

If you had several files to send to someone, you could compress each one individually. You would has a smaller amount of data in total than if you sent uncompressed files and you would still has to deal wit H many files at one time.

Archiving is the solution to this problem. The traditional UNIX utility to archive files are called tar , which are a short form of TApe archive. Tar is used to stream many files to a tape for backups or file transfer. Tar takes in several files and creates a single output file that can is split up again to the original files on the othe R end of the transmission.

Tar have 3 modes you'll want to being familiar with:

Create: Make a new archive out of a series of files
Extract: Pull one or more files out of an archive
List: Show the contents of the archive without extracting

Remembering the modes is key to figuring out of the command line options necessary to doing what you want. In addition to the mode, you'll also want to make sure you remember where to specify the name of the archive, as a May Be entering multiple file names to a command line.

Here, we show a tar file, also called a tarball, being created from multiple access logs.

bob:tmp $ tar-cf access_logs.tar access_log*bob:tmp $ ls-l access_logs.tar-rw-rw-r--1 Sean Sean 54272 0 Oct 21:42 Access_logs.tar

Creating An archive requires the named options. The First,&NBSP; c , specifies the mode. The Second,&NBSP; f , tells Tar to expect a file name as the next Argum Ent. The first argument in the example above creates an archive Called access_logs.tar . The remaining arguments is all taken to be input file names, either as a wildcard, a list of files, or both. In this example, we use the wildcard option to include all files that begin With access_log .

The example above does a long directory listing of the created file. The final size is 542,720 bytes which is slightly larger than the input files. Tarballs can compressed for easier transport, either by gzipping the archive or by has tar do it with the Z flag as follows:

bob:tmp $ tar-czf access_logs.tar.gz  access_log*bob:tmp $ ls-l access_logs.tar.gz-rw-rw-r--1 Sean Sean 46229 Oct 21:50 access_logs.tar.gzbob:tmp $ gzip-l access_logs.tar.gz         Compressed        Uncompressed  ratio uncompressed_name              46229              542720  91.5% Access_logs.tar

The example above shows the same command as the prior example, but with the addition of the Z parameter. The output is much smaller than the tarball itself, and the resulting file are compatible with gzip. You can see from the last command, the uncompressed file is the same size as it would was if you tarred it in a separat E step.

While UNIX doesn ' t treat file extensions specially, the Convention are to use. Tar for tar files, and. tar.gz or. tgz for C Ompressed tar files. You can use BZIP2 instead of gzip to substituting the letter J for z and using. tar.bz2, . tbz, or. tbz2 for a file extension (e.g. tar –cjf file.tbz access_log* ).

Given a tar file, compressed or not, you can see what's in it by using the T command:

bob:tmp $ TAR-TJF Access_logs.tbzlogs/logs/access_log.3logs/access_log.1logs/access_log.4logs/access_ Loglogs/access_log.2

This example uses 3 options:

t: List files in the archive
j: Decompress with bzip2 before reading
f: Operate on the given filename (ACCESS_LOGS.TBZ)

The contents of the compressed archive is then displayed. You can see that a directory is prefixed to the files. Tar would recurse into subdirectories automatically when compressing and would store the path info inside the archive.

Just to show that this file was still nothing special, we'll list the contents of the file in both steps using a pipeline.

bob:tmp $ bunzip2-c access_logs.tbz | Tar-tlogs/logs/access_log.3logs/access_log.1logs/access_log.4logs/access_loglogs/access_log.2

The left side bunzip –c access_logs.tbz of the pipeline are, which decompresses the file but the ( -c option) sends the output to the Screen. The output is redirected to tar –t . If You do not specify a file with then –f Tar would read from the standard input, which in the uncom Pressed file.

Finally You can extract the archive with the –x flag:

bob:tmp $ tar-xjf access_logs.tbzbob:tmp $ ls-ltotal 36-rw-rw-r--1 sean Sean 30043 Oct 13:27 acces S_logs.tbzdrwxrwxr-x 2 Sean Sean  4096 Oct 13:26 logsbob:tmp $ ls-l logstotal 536-rw-r--r--1 Sean Sean 37 2063 Oct 21:24 access_log-rw-r--r--1 Sean Sean    362 Oct 21:41 access_log.1-rw-r--r--1 sean Sean 153813 Oct 12 2 1:41 access_log.2-rw-r--r--1 Sean Sean   1136 Oct 21:41 access_log.3-rw-r--r--1 Sean Sean    784 Oct 21:41 ACCE Ss_log.4

The example above uses the similar pattern as before, specifying the operation (eXtract), the compression (the J flag, MEA Ning Bzip2), and a file name (-F access_logs.tbz). The original file is untouched and the new logs directory is created. Inside The directory is the files.

ADD the flag and you'll –v get verbose output of the files processed. This was helpful so can see what's happening:

bob:tmp $ tar-xjvf Access_logs.tbzlogs/logs/access_log.3logs/access_log.1logs/access_log.4logs/access_ Loglogs/access_log.2

It's important –f to keep the flag at the end, as tar assumes whatever follows it's a filename. In the next example, the and f v flags were transposed, leading to tar interpreting the command as An operation to a file called "V" (the relevant message is in italics.)

bob:tmp $ tar-xjfv Access_logs.tbztar (Child): V:cannot open:no such file or Directorytar (child): Error is No T recoverable:exiting Nowtar:child returned status 2tar:error is isn't recoverable:exiting now

If you have want some files out of the archive you can add their names to the end of the command, but by default they must Match the name in the archive exactly or use a pattern:

bob:tmp $ tar-xjvf access_logs.tbz Logs/access_loglogs/access_log

The example above shows the same archive as before, but extracting only the "Logs/access_log" file. The output of the command (as verbose mode is requested with the "V" flag) is shows only the one file has been extracted.

Tar have many more features, such as the ability to use patterns when extracting files, excluding certain files, or Outputt ing the extracted files to the instead of disk. The documentation for TAR have in depth information.

Linux-archiving and Compression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux-archiving and Compression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux-archiving and Compression

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support