In-depth description of LZW Compression Algorithm

Source: Internet
Author: User

Address: http://tech.watchstor.com/management-115343.htm

    • Abstract:LZW CompressionAlgorithmIt is a novel compression method created by lemple-Ziv-Welch and named by their names. It is amazing that the string table can be correctly created during compression or decompression. After compression or decompression, the string table is discarded.
    • Tags:LZW Compression Algorithm

 

The LZW compression algorithm is a novel compression method created by lemple-Ziv-Welch and named after them. It adopts an advanced string table compression mode, where each first occurrence string is placed in a string table and a number is used to represent the string. The compressed file stores only numbers, therefore, strings are not stored, which greatly improves the compression efficiency of image files.

 

It is amazing that the string table can be correctly created during compression or decompression. After compression or decompression, the string table is discarded.

 

1. Basic principles of LZW Compression Algorithm

 

First, create a string table, put each string that appears for the first time in the string table, and use a number to represent it. This number is related to the position of the string in the string table, and save the number to the compressed file. If the string appears again, it can be replaced by a number that represents it and stored in the file.

 

After compression, the string table is discarded. For example, for a "print" string, if it is expressed as 266 during compression, as long as it appears again, it is expressed as 266, and the "print" string is stored in the string table. When decoding an image, the number 266 is displayed, the string "print" represented by string 266 can be found from the string table. during decompression, the string table can be regenerated Based on the compressed data.

 

2. Implementation of LZW Compression Algorithm

 

A. initialize the string table

 

When compressing image information, you must first create a string table to record each string that appears for the first time. A string table consists of at least two character arrays. One is the current array and the other is the prefix array, because each basic string in a GIF file is usually 2 characters in length (but it represents the actual string length up to several hundred or even thousands of characters ), A basic string consists of the current character and its prefix.

 

The prefix array stores the first character in the string. The current array stores the last character in the string at the same position. Therefore, you only need to determine a subscript to determine the basic string it stores, therefore, when compressing data, replace the basic string with a subscript.

 

Generally, the size of a string table is 4096 bytes (the 12 power of 2). This means that a string table can store up to 4096 basic strings. during initialization, the number of colors in the image is determined, assign the starting position of the byte in the string table to a number. Generally, the content in the current array is the sequence number (I .e. subscript) of the element. For example, if the first element is 0 and the second element is 1, the first element is 14 until the subscript is the color number plus 2. If the color number is 256, It is initialized to 258th bytes, and the value in this byte is 257.

 

The number 256 indicates the clear code, and the number 257 indicates the image end code. The next byte stores each first occurrence string in the file. Similarly, the concert prefix array should be initialized, where the values of each element are any number, but generally the values of each element in each position are 1, which is about to initialize each element in the starting position as 0xff, the number of initialized elements is the same as that of the current array, and the subsequent elements are saved to each string that appears for the first time. Increasing the length of the string table further improves the compression efficiency, but reduces the decoding speed.

 

B. Compression Method

 

When learning about the compression method, you must first understand several terms, one is the upstreaming stream, and the other isCodeStream, the third is the current Code, and the fourth is the current prefix. The hidden stream is the uncompressed image data in the source image file; the code stream is the compressed image data written into the GIF file after compression; the current code is the character just read from the hidden stream; the current prefix is the character before the characters just read.

 

When a GIF file is compressed, the color value must be put into the code stream in byte units regardless of the number of digits of the image color. Each byte represents a color. Although there is a waste of four or more bits in the source image file when one byte represents 16, 4, and 2 (because the four bits in one byte can represent 16 colors ), however, when LZW compression is used, the idle bit in the byte can be recycled.

 

During compression, the first character is read from the primary stream as the current prefix, and the second character is taken as the current code, the current prefix and the current Code constitute the first basic string (for example, if the current prefix is a and the current code is B, this string is AB). Check the string table and the same string will not be found at this time, this string is written to the string table, the current prefix is written to the prefix array, the current code is written to the current array, and the current prefix is sent to the code stream, the current code is put into the current prefix, and then the next character is read, this character is the current code, and a new basic string is formed (if the current code is C, the basic string is BC). query the string table, if this string exists, the value in the current prefix is discarded, and the position code (subscript) of the string in the string table is used as the current prefix, And the next character is read as the current code, form a new basic string until the entire image is compressed.

 

It can be seen that during compression, the value in the prefix array is a character in the code stream. Code larger than the color number must represent a string, the Code smaller than or equal to the number of colors is the color itself.

 

C. Clear the code

 

In fact, when compressing an image, you often need to initialize the string table multiple times. Generally, the number of basic strings that appear for the first time in an image will exceed 4096, during the compression process, as long as the length of the string exceeds 4096, enter the current prefix and current code into the code stream, add a clear code to the code stream, and initialize the string table, continue to compress as described above.

 

D. End code

 

After all the compression is complete, an image end code is output to the code stream. The value is the color number plus 1. In the 256 color file, the end code is 257.

 

E. Reclaim byte Space

 

The data in the code stream output from GIF files is stored in units except as data packets, which effectively saves storage space. This is like a 4-bit color (16-color) image. When it is stored in bytes, only four of the four digits can be used, and the other four digits are wasted, each byte can store two color codes.

 

In fact, a variable storage method is used in GIF files. The compression process shows that the values of each element in the string table prefix array are regular, in a 256-color GIF file, the value range of the element 258-511 is 0-510, Which is exactly represented by the 9-bit binary number. The value range of the 512-1023 element is 0-1022, the value range is 0-2046, indicating that the value range of the element 1024-2047 is 0-4094. The value range of the element 2048-4095 is 0, it is represented by a 12-bit binary number.

 

When code is stored with variable digits, the basic digits add 1 to the color digits of the image. As the number of codes increases, the number of digits is also increasing, until the number of digits exceeds 12 (in this case, the number of strings in the string table is exactly the 12 power of 2, that is, 4096 ). The basic method is: each time a character is added to the code stream, it is necessary to determine whether the position of the string in the string table (I .e. subscript) exceeds the power of the current number of digits of 2. Once it exceeds, number of digits plus 1.

 

For example, in a 4-bit image, For the first code to be stored in 5 bits, the first low 5 bits of the first byte are placed in the first code, and the third bits are three bits of the second code, the lower two bits of the second byte are placed in the higher two bits of the second code, and so on. For an 8-bit (256-color) image, the base number of digits is 9, and the minimum size of a code is two bytes.

 

F. LZW compression algorithm's compression range

 

The following is an example of 256-color GIF file encoding. If you are aware that this is a wonderful encoding method, and why do you no longer need a string table after compression, in addition, you can recreate the string table based on the code stream information during decoding.

 

String ,...

 

Current code ,...

 

Current Prefix: 258,262 ,...

 

Current array ,...

 

Array subscripts: 258,259,260,261,262,263,264,265,266,267 ,...

 

Code stream: 1,260,258, 5 ,...

 

As an important image file format, GIF files have extremely complex encoding rules, but their compression efficiency is extremely high, especially for images with smooth transitions, the compression effect is better. At the same time, because of its complete preservation of image information during the compression process, it has been widely used in popular electronic images and e-books.

 

The above is an in-depth explanation of the LZW compression algorithm, hoping to help you.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.