Stupid Data Compression tutorial-Chapter 4 extreme challenges: Arithmetic Coding

Source: Internet
Author: User
Tags ranges

In the previous chapter, we understand that Huffman
Encoding uses integers and binary digits to encode the symbols. In many cases, this method cannot obtain the optimal compression effect. Assume that the probability of occurrence of a character is 80%.
-Log2 (0.8) = 0.322-bit encoding, but the Huffman encoding will assign a 0 or 1 encoding to it. As you can imagine, 80% of the total information
After compression, the compression effect is almost three times the ideal length.

Is it true that only 0.322 0 or 0.322 1 s can be output?
? Is scissors used to cut the binary bit in computer memory? Does a computer have such a special function? Slow and slow. We should not be confused by the surface phenomenon. In fact, we only need to change this problem.
Brain, from another perspective ...... Oh, I still cannot figure it out. How can it be half? Well, you don't have to worry about it. mathematicians just thought of the magic method of arithmetic coding more than a decade ago.
Find out from which point of view the breakthrough lies.

Output: A decimal number.

What's even more amazing is that the output of arithmetic code is only a number for the entire information (no matter how long the information is) and a binary decimal between 0 and 1. For example, if the output of an arithmetic code is 1010001111, it indicates a decimal number of 0.1010001111, that is, the decimal number of 0.64.

Why? Why is it that it represents half a binary digit while it outputs a decimal number? How is the arithmetic code so weird? Don't worry. Let's use the following simple example to explain the basic principles of arithmetic coding. To indicate clarity, we temporarily use decimal to represent Decimals in the algorithm, which does not affect the feasibility of the algorithm.

The possible characters in a piece of information are only a B c. We need to compress and save the information as bccb.

Before starting the compression process, let's assume that
The probability of occurrence of the three in the information is unknown (we use an adaptive model), no way, We temporarily think that the probability of occurrence of the three is equal, that is, 1/3, We Will 0-1
Intervals are allocated to three characters according to the ratio of probability, that is, a ranges from 0.0000 to 0.3333, B ranges from 0.3333 to 0.6667, and c ranges from 0.6667.
1.0000. Graphically:

+ -- 1.0000.
Pc = 1/3 |
+ -- 0.6667.
Pb = 1/3 |
+ -- 0.3333.
CA = 1/3 |
+ -- 0.0000.

Now we get the first character "B", and let's look at the corresponding range of "B": 0.3333-0.6667. At this time, because there are more characters
B. the probability distribution of the three characters is: Pa = 1/4, Pb = 2/4, Pc = 1/4. Well, let's divide 0.3333 based on the new probability distribution ratio-
0.6667 in this interval, the Division results can be graphically expressed:

+ -- 0.6667.
Pc = 1/4 |
+ -- 0.5834.
Pb = 2/4 |
+ -- 0.4167.
CA = 1/4 |
+ -- 0.3333.

Next we get the character c. Now we need to pay attention to the range of c 0.5834-0.6667 in the previous step. After c is added, the probability distribution of the three characters becomes Pa = 1/5, Pb = 2/5, Pc = 2/5. We use this probability distribution to divide the interval between 0.5834 and 0.6667:

+ -- 0.6667.
Pc = 2/5 |
+ -- 0.6334.
Pb = 2/5 |
+ -- 0.6001.
CA = 1/5 |
+ -- 0.5834.

Enter the next character c. the probability distribution of the three characters is: Pa = 1/6, Pb = 2/6, Pc = 3/6. Let's divide the c range by 0.6334-0.6667:

+ -- 0.6667.
Pc = 3/6 |
+ -- 0.6501.
Pb = 2/6 |
+ -- 0.6390.
CA = 1/6 |
+ -- 0.6334.

Enter the last character B. Because it is the last character, no further division is required. The range of B obtained in the previous step is 0.6390-
0.6501. Well, let's select a number that is easy to convert to binary in this range, for example, 0.64, and convert it to binary.
0.1010001111, remove the 0 and decimal points that do not have much meaning before, we can output
1010001111. This is the result of information compression. We have completed the simplest arithmetic compression process.

How are you doing? How can I decompress it? That's easier. Before decompression, we still assume that the probability of the three characters is equal and the first distribution chart above is obtained. When decompression, we face
The binary stream is 1010001111. First, we add 0 and the decimal point to convert it to decimal 0.1010001111, that is, decimal 0.64. We found that
0.64 falls into the range of character B in the distribution chart. We immediately output character B and obtain a new probability distribution of three characters. Similar to the compression method, we divide characters according to the new probability distribution.
B. In the new division, we found that 0.64 falls into the c interval, and we can output characters.
C. Similarly, we can continue to output all the characters to complete the decompression process (NOTE: For the sake of convenience, we have avoided the problem of how to judge the decompression end. In actual application, this problem is not difficult.
Solve ).

Now let's leave the tutorial aside and think over it until you understand the basic principles of arithmetic compression and have encountered many new problems.

Is it really close to the limit?

Now you must understand something, and there must be a lot of new problems. It doesn't matter. Let's solve them one by one.

First of all, we have repeatedly stressed that arithmetic compression can represent decimal binary digits, and thus can be close to the entropy limit of lossless compression. How can we not see it from the above description?

In fact, the arithmetic encoding uses the thought of dividing zero into Integers to represent decimal binary digits. We really cannot accurately represent a single decimal character, but we can represent it in many character sets, only some margin of error in the last digit is allowed.

Based on the preceding simple example, each input symbol adjusts the probability distribution table and limits the number of decimals to be output to a smaller range. Limit on the output range
This is the key to the problem. For example, when we enter the first character B, the output interval is limited to 0.3333-0.6667. We cannot decide whether the output is worth the first priority.
3, 4, 5, or 6, that is, B
The encoding length is less than one decimal bit (note that we will explain in decimal format, which is not exactly the same as that in binary format), so we do not decide to output any bit of information and continue to input the following characters. Until the third
After the character c, our output range is limited to 0.6334-0.6667, and we finally know that the first (decimal) of the output decimal is
6, but still cannot know the second bit, that is, the length of the first three characters is between 1 and 2. After all characters are input, our output range is 0.6390-
0.6501, we never get the exact information about the second digit. Now we understand that to output all the four characters, we only need one or more decimal digits. Our only choice is to Output 2.
Decimal (0.64. In this way, the error cannot exceed 1.
In the case of decimal digits, all information is output accurately, which is close to the entropy value (it must be noted that, in order to better integrate with the following courses, the above example uses 0
There is still a gap between the result and the actual Entropy Value of the level-1 adaptive model ).

How long is the decimal number?

You must have thought that if the information is rich in content, the number of decimals to be output will be very long and long. How can we express such a long decimal number in the memory?

In fact, there is no need to store the entire decimal point to be output in the memory. As we can see from the above example, during encoding, We will continuously get information about the decimals to be output. With
Specifically, when we limit the range to 0.6390-0.6501, we already know that the first decimal (decimal) to be output must be 6, then we can set
Remove from memory, and then in the interval 0.390-0.501
To continue our compression process. There will never be a very long decimal number in the memory. The same is true when binary is used. We will continue to determine whether the next binary bit to be output is 0 or
1, and then output this bit and reduce the length of small memory size.

How to Implement static models?

We know that the simple example above uses an adaptive model. How can we implement a static model? It is actually very simple. For information bccb, we calculate that there are only two characters and the probability distribution is
Pb = 0.5, Pc =
0.5. We do not need to update this probability distribution during the compression process. Each interval is divided according to this distribution. In the previous example, the interval is evenly divided. In this way, our compression process can be simply expressed:

The lower limit of the output range.
Before compression 0.0 1.0
Input B 0.0 0.5
Input c 0.25 0.5
Input c 0.375 0.5
Input B 0.375 0.4375

We can see that the final output range is 0.375-0.4375.
Or even a decimal bit is unknown. That is to say, the entire information cannot use a decimal bit. If we use binary to represent the above process, we will find that we can be very close to this
Information entropy (some readers may have figured out that the information entropy is 4 binary digits ).

Why use adaptive models?

Since we can use a static model to get close to the entropy value, why do we need to adopt an adaptive model?

We need to know that the static model cannot adapt to the diversity of information. For example, the probability distribution we obtained above cannot be used on all the information to be compressed. In order to extract the information correctly, we must consume a certain amount of null.
Save the probability distribution calculated by the static model. Saving the space used by the model will keep us away from the entropy value. Second, the static model requires statistics on the character distribution in the information before compression, which will consume
A large amount of time, making it slower to compress arithmetic code.

In addition, the most important thing is that for a long piece of information, the probability of a symbol calculated by the static model is the probability that the symbol appears in the entire information, the adaptive model can calculate the number
Probability of occurrence or the probability of occurrence of a symbol relative to a certain context. In other words, the probability distribution obtained from the adaptive model is conducive to information compression (it can be said that the information entropy of the adaptive model based on context is established in
At a higher probability level, the total entropy is smaller.) A good context-based adaptive model will produce a much larger compression result than a static model.

Order of Adaptive Model

We usually use the term "order" to distinguish different adaptive models. In the example at the beginning of this chapter, we use a zero-level adaptive model. That is to say, this example calculates the probability of a symbol appearing in the input information without considering any context information.

If we change the model into the probability of occurrence of a statistical symbol after a specific symbol, then the model becomes 1
Level context adaptive model. For example, we want to encode an English text. We have already encoded 10000 English characters. The character we just encoded is t, and the next character to be encoded is
H. In the previous encoding process, we have found that the first 10000 characters contain 113 letters t, of which 47 are followed by the letter h. The character h is obtained.
The occurrence frequency after the character t is 47/113. We use this frequency to encode the character h, which requires-log2 (47/113) = 1.266 characters.

Compared with the 0-level adaptive model, if h appears 82 times in the first 10000 characters, the probability of the character h is 82/10000. We use this probability to encode h, -log2 (82/10000) = 6.930 bits are required. The advantage of considering context factors is obvious.

We can further expand this advantage. For example, the first two characters of the encoding character h are gt, and the probability of h following the encoding character h is 80%, we can encode the output character h with only 0.322 characters. In this case, the model we use is called the Level 2 context adaptive model.

The ideal scenario is to use a third-level adaptive model. At this time, if combined with arithmetic encoding, the information compression effect will reach an astonishing level. The system space and time required to adopt a higher-level model are unacceptable at least for the moment. Most applications that use arithmetic compression use adaptive models of level 2 or level 3.

Functions of escape codes

To use the arithmetic coding algorithm of the adaptive model, you must consider how to encode the context that never appeared before. For example, in a level 1 context model, the probability of occurrence may be 256
* 256 = 65536, because all the characters between 0 and 255 may appear between 0 and 255.
After any character. When we face a context that has never appeared before (for example, we have just encoded character B and want to encode character d, before that, d has never appeared in B
), How to determine the probability of a character?

A simple method is to assign the Count of occurrences of 1 to all possible contexts before compression starts. If a bd combination is encountered during compression, we think d
The number of times after B is
1, and the probability can be correctly encoded. The problem with this method is that the characters in a certain context have a relatively small frequency before compression starts. For example
Level-1 context model. Before compression, the frequency of any character is artificially set to 1/65536. According to this frequency, each character must use 16 at the start of compression.
Bit encoding. The compression effect will be improved gradually only when frequent characters occupy a large space in the frequency distribution chart. For level 2 or level 3
Level context model makes the situation worse. We need to waste a lot of space for the vast majority of contexts that never appear.

We solve this problem by introducing the "Escape code. The "Escape code" is a special mark mixed in the compressed data stream. It is used to notify the extract program that the next context has never appeared before and requires low-level context encoding.

For example, in the Level 3 context model, we have just encoded ght, And the next character to be encoded is a. Before that, there was no character after ght.
A. At this time, the compression program outputs the escape code and then checks the Level 2 context table to see the number of times a appears after ht. If a appears after ht, 2 is used.
The probability in the level-1 context table is a encoding. Otherwise, the escape code is output to check the level-1 context table. If the level-1 context table still cannot be found, the escape code is output and the minimum value is 0.
Level context table to check whether character a exists before. If no character a exists before, go to a special "escape" context table, which contains 0-255 characters.
All symbols, each of which has a count of 1 and will never be updated. Any symbols that do not appear in a higher-order context can be returned here for encoding at a 1/256 frequency.

The introduction of the "Escape code" frees us from the context troubles that have never occurred before, and enables the model to quickly adjust to the optimal position based on the changes in input data, and quickly reduce the number of digits required for high probability symbol encoding.

Storage space problems

In the implementation of the high-order context model of arithmetic coding, the demand for memory is a very tricky problem. Because we must keep the count of existing contexts, and there may be so many context types in the higher-level context model, the data structure design will directly affect the algorithm implementation success.

In the level 1 context model, it is feasible to use an array to calculate the number of occurrences. However, for Level 2 or level 3 context models, the array size increases exponentially, the memory of the existing computer cannot meet our requirements.

A clever way is to store all the context that appears in the tree structure. The higher-order context is always based on the lower-order context.
Level 1 context tables are stored in arrays. Each array element contains pointers to corresponding level 1 context tables. Level 1 context tables also contain pointers to Level 2
Level context table pointer ...... The entire context tree is formed. Only the context that appears in the tree has the allocated node. The context that does not appear does not need to occupy the memory space. In each context table
Saves the count of all 256 characters. Only characters that appear after the context have the Count value. Therefore, we can minimize the space consumption.


For the specific design and implementation of arithmetic compression, see the example program below.

Arith-N is provided by Mark Nelson of League for Programming Freedom and compiled and debugged by Wang Benben in Visual C ++ 5.0.

Arith-N contains the Visual C ++ project ArithN. dsp and ArithNExpand. dsp, which correspond to the compression and decompression programs an.exe and ane.exe respectively.

Arith-N is a general-purpose compression and decompression program that can be used to specify the order of the N-level Context Adaptive Arithmetic encoding in the command line. As it is used as a Tutorial example, for clarity, in some places, efficiency optimization is not intentionally performed.

All source programs are packaged in the file

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.