Detailed description of scattered algorithms data compression algorithms (II)

Source: Internet
Author: User

In-depth parsing of Data Compression Algorithms

 

Preface

Before starting this article, let's review the previous article. The previous article explains two types of data compression algorithms: The Huffman compression algorithm and the RLE compression algorithm.

Data Compression Algorithm (on): http://blog.csdn.net/fengchaokobe/article/details/7934865

 

Body

 

This article will explain the following two data compression algorithms: Rice compression algorithm and LZW compression algorithm.

 

Section 1 LZW Compression Algorithm

 

LZW compression algorithm: Lempel-Ziv-Welch encoding. The algorithm name is simple and straightforward, so the LZW compression algorithm also follows the algorithm naming feature: easy to understand! --- It's a bit bullshit. start to do things!

 

The implementation of the LZW compression algorithm is to create a string table during the encoding process and use a number to represent the string. The compressed file only stores the number. These numbers are the compressed data.

 

Implementation principle of the LZW algorithm: Because duplicate strings may appear in the source data string, we use this to combine the strings that appear for the first time and use a certain number to represent them, save the number. When this string is met again, this number can be used directly to compress the string. Therefore, obtaining the correct number table is the key to successful compression.

From the principle, we can know that, in the source data, the more repeated strings, the better the compression effect.

 

Well, the theory is nothing more. Now that we understand it, Let's illustrate it in the example above.

The following source data strings are available, and LZW compression algorithm is used for compression:

 

Preparations before the war:

1. As mentioned in the principle, we use a certain number to represent the compression result. How can we choose the starting point of these numbers? In this case, we will talk about the application of LZW compression algorithm. LZW is mainly used in image processing. If the number of colors is 256, we need to start from 258 (256 indicates the clear code and 257 indicates the image end code ).

 

2. Some declarations in the encoding process:

Source data: the target data to be compressed;

Index Array: the matching array of the number table corresponding to the encoded string table.

Matching result: Yes | No.

Encoding sequence table: the encoded result.

Number table number: when a new encoding is generated, the corresponding number is obtained.

 

Everything is ready and enters the status:

The following rules must be observed during encoding:

Encoding Process:

We can see from the table that the characters and numbers in the encoding sequence table are the result of LZW encoding. We only need to match the string table with the number table one by one, and at last we only keep the number string.

 

Okay, this is the Encoding Process of the LZW compression algorithm. However, there is another problem that needs to be solved, that is, the encoding number starts from 258. How many times does it end? Obviously, we cannot keep increasing it! Therefore, we will stipulate that when the number reaches 4096 (the GIF specification specifies 12 bits, and the expression range exceeding 12 bits must be repeated ), we will reinitialize the entire label set and start to use the new tag, which improves utilization and efficiency. Why not!

 

The LZW compression algorithm is finished, and the implementation process of the algorithm is roughly described. However, some details are still not described. If you encounter it, you will definitely add it in the article!

 

Last, the core code of LZW encoding algorithm:

/*** This struct is used to represent the encoding string and numeric value with the struct member ***/typedef struct dictionary {intvalue;/** encoded value **/unsigned charprefix_string; /** compare with prefix **/unsigned charchar_add;/** compare with suffix **/} dictionary; dictionarydict [maxlength];/*** maxlength = 4096 **/
/*** LZW compression algorithm implementation process ***/int * lzw_coding (unsigned char * src_ch, unsigned int * prefix_suffix, int src_length, unsigned int * char_stream) {/*** src_ch indicates the source data string *** prefix_suffix indicates the Index Array *** src_length indicates the length of the source data string *** char_stream indicates the encoded number table ***/Inti = 0; intj = 0; intk = 0; inttemp = 0; intcode = 258; // start from 258 while (I <src_length) {prefix_suffix [J + 1] = src_ch [I]; // assign a value to suffixif (prefix_suffix [J] = 0) {prefix_suffix [J] = Pref Ix_suffix [J + 1]; I ++; continue;} temp = compare (prefix_suffix [J], prefix_suffix [J + 1]); If (dict [temp]. value! = Unused) // can be found in the string table, unused =-1 {prefix_suffix [J] = dict [temp]. value; // if the same index is changed} else // The string table cannot be found. If it is different, the code {dict [temp]. value = Code ++; dict [temp]. prefix_string = prefix_suffix [J]; dict [temp]. char_add = prefix_suffix [J + 1]; char_stream [k ++] = prefix_suffix [J]; prefix_suffix [J] = prefix_suffix [J + 1];} I ++ ;} return char_stream ;}
/*** Check whether the Index Array string has appeared in the string table ***/INT compare (unsigned char prefix, unsigned char suffix) {Inti = 0; I = prefix % maxlength; while (1) {If (dict [I]. value = Unused) {return I;} If (dict [I]. prefix_string = prefix & dict [I]. char_add = suffix) {return I;} I ++ ;}}

References: http://tech.watchstor.com/management-115343.htm

 

Section 2 rice Compression Algorithm

 

Rice encoding: Rice encoding, an algorithm invented by Robert F. Rice, is called "Rice encoding ".

 

RiceThe basic idea of a compression algorithm is to use a few bits to represent multiple words (or numbers). More importantly,It can distinguish the location of the current word (or number) from the next word (or number).

 

Some people call the rice compression algorithm a static Huffman encoding algorithm, which makes sense.

Note:The rice compression algorithm is generally used to operate on smaller numbers, because the smaller the number, the less bit it requires.

 

Principle of the rice compression algorithm: divisor = divisor * operator + remainder. After compression by the "rice" compression algorithm, the result is composed of the divisor and the remainder.

 

Let me explain in detail this principle:

Make S = Q * m + R (the combination of Q and R will be the compression result of S), and the representation of the compression result:

S: Indicates the number of data to be compressed;

M: is a constant, m
= 2 K;

Q: q = s> K;

R: r = S & (m-1), R is expressed in K bits.

For K, K indicates the number of digits of a number, and this K is determined by the average coefficient of some numbers. For example, if there are five numbers: 10, 12, 14, 18, and 36, you will find that the average number of the first three numbers is 4, so this K is 4. That is to say, in a certain range, some numbers appear more often, and these numbers can be expressed in K bits, then K is settled.

 

Okay, the principle is white. Let's use an example to implement the encoding process:

Before "rice" compression, I would like to explain the conventional method so that we can compare it with the "rice" Compression Algorithm for a clearer understanding.

 

Generally, we encode the preceding five numbers: 1010 1100 1110 10010 100100. Think about how to distinguish the current number from the next number during decoding? This is troublesome. The rice algorithm solves this problem.

 

We use the "rice" compression algorithm to encode the numbers 10, 12, and 36. In principle, we have explained how to obtain K, So k = 4. Then there are:

First, the numbers 10, 12, and 14 can be expressed in 4 bits, that is, 1110. K = 4 is known during decoding, so it is easy.

 

Next, for 18, it can be expressed as: 10010, according to the encoding principle:

M = 2 k = 16;

Q = s> K = 18> 4 = 10010> 4 = 0b0001; Convert Q to 1 in decimal format. According to the representation method, 1 is followed by 1 0, that is, 10;

R = S & (m-1) = 18 & (16-1) = 10010 & 1111 = 0b0010

Therefore, the compression result is: 100010.

 

Finally, for 36, it is expressed as: 100100. According to the encoding principle:

M = 2 k = 16;

Q = s> K = 36> 4 = 100100> 4 = 0b0010; the value of Q in decimal format is 2. According to the representation method, the value of 1 is followed by two zeros;

R = S & (m-1) = 36 & (m-1) = 100100 & 1111 = 0b0100

The compression result is: 1000100.

 

It's easy. This is the "rice" compression algorithm. The key to this algorithm is to identify the location of the current number and the next number! The decompression process is simpler. If we know the K value and Q and R specifications, We can decompress it directly!

 

Okay. The compression and decompression algorithms are as follows:

Compression: char * rice_coding (char SRC) {If (SRC & 0xf0 = 0) // The flag position is 0 {printf directly outputs K bits ;} else // exceeds K bits, flag position 1 {q = SRC> K; temp_q = (INT) (Q & 0xff); r = SRC & (m-1 ); printf 1 + temp_q + R ;}}
Decompress: Char rice_decoding (char * SRC) {If (flag is 0) // The source data can be restored using K-bit representation.} else // The flag is 1 {the value of Q is obtained, and Q is known (obtained from the compression process ); r string = Src-Q string; S = Q × m + R ;}}

Section 3 Concluding remarks

 

Think about, write, and draw ......

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.