Extreme Challenge: Arithmetic Coding
(To) http://blog.csdn.net/hhf383530895/archive/2009/08/24/4478605.aspx
We have learned in the previous chapter that Huffman encoding uses an integer binary bit to encode the symbol. In many cases, this method cannot obtain the optimal compression effect. Assume that the occurrence probability of a character is 80%. In fact, onlylog2 (0.8) = 0.322bit encoding is required, however, the Huffman encoding will assign a 0 or 1 encoding to it. As you can imagine, 80% of the information is almost three times the ideal length after compression, and the compression effect can be imagined.
Is it true that only 0.322 0 s or 0.322 1 s can be output? Is scissors used to cut the binary bit in the computer storage device? Does a computer have such a special function? Slow and slow. We should not be confused by superficial phenomena. In fact, in this case, we only need to change our minds from another perspective ...... Oh, I still cannot figure it out. How can it be half? Well, you don't have to worry about it. mathematicians just thought of the magic method of arithmetic coding more than a decade ago, let's look at the point from which they find the breakthrough.
Output: A decimal number.
What's even more amazing is that the output of arithmetic code is only a number for the entire information (no matter how long the information is) and a binary decimal between 0 and 1. For example, if the output of an arithmetic code is 1010001111, it indicates a decimal number of 0.1010001111, that is, the decimal number of 0.64.
Why? Why is it that it represents half a binary digit while it outputs a decimal number? How is the arithmetic code so weird? Don't worry. Let's use the following simple example to explain the basic principles of arithmetic coding. To indicate clarity, we temporarily use decimal to represent Decimals in the algorithm, which does not affect the feasibility of the algorithm.
The possible characters in a piece of information are only a B c. We need to compress and save the information as bccb.
Before starting the compression process, let's assume that we have no idea about the probability of the B C three appearing in the information (we adopt an adaptive model). No way, we temporarily think that the probability of the three appearing is equal, that is to say, we allocate the 01 range to three characters according to the ratio of probability, that is, a ranges from 1/3 to 0.0000, B ranges from 0.3333 to 0.3333, and C ranges from 0.6667 to 0.6667. Graphically:
+  1.0000.

PC = 1/3 

+  0.6667.

PB = 1/3 

+  0.3333.

CA = 1/3 

+  0.0000 now we get the first character "B". Let's look at the range 0.33330.6667 corresponding to "B. When character B is added, the probability distribution of three characters becomes: Pa = 1/4, Pb = 2/4, Pc = 1/4. Well, let's divide the range 0.33330.6667 based on the new probability distribution ratio. The results can be graphically expressed:
+  0.6667 Pc = 1/4  +  0.5834  Pb = 2/4  +  0.4167 Pa = 1/4  +  0.3333
Next we get the character C. Now we need to pay attention to the range of C 0.58340.6667 in the previous step. After C is added, the probability distribution of the three characters becomes Pa = 1/5, Pb = 2/5, Pc = 2/5. We use this probability distribution to divide the interval between 0.5834 and 0.6667:
+  0.6667  Pc = 2/5  +  0.6334  Pb = 2/5  +  0.6001 Pa = 1/5  +  0.5834
Enter the next character c. the probability distribution of the three characters is: Pa = 1/6, Pb = 2/6, Pc = 3/6. Let's divide the C range by 0.63340.6667:
+  0.6667  Pc = 3/6  +  0.6501  Pb = 2/6  +  0.6390 Pa = 1/6  +  0.6334
Enter the last character B. Because it is the last character, no further division is required. The range of B obtained in the previous step is 0.63900.6501, let's select a number that is easy to convert to binary in this range, for example, 0.64, and convert it to binary 0.1010001111. We can output 1010001111, this is the result of information compression. We have completed the simplest arithmetic compression process.
How are you doing? How can I decompress it? That's easier. Before decompression, we still assume that the probability of the three characters is equal and the first distribution chart above is obtained. During decompression, we are faced with the binary stream 1010001111. We first add 0 to the front and convert it to decimal 0.1010001111, that is, decimal 0.64. At this time, we found that 0.64 falls into the range of character B in the distribution chart. We immediately output character B and get a new probability distribution of three characters. Similar to the compression method, we divide the interval of character B according to the new probability distribution. In the new division, we found that 0.64 falls into the range of the character C, and we can output the character C. Similarly, we can continue to output all the characters to complete the decompression process. (Note that for the sake of convenience, we have avoided the problem of how to determine the decompression is complete. In actual application, ).
Now let's leave the tutorial aside and think over it until you understand the basic principles of arithmetic compression and have encountered many new problems.
Is it really close to the limit?
Now you must understand something, and there must be a lot of new problems. It doesn't matter. Let's solve them one by one.
First of all, we have repeatedly stressed that arithmetic compression can represent decimal binary digits, and thus can be close to the entropy limit of lossless compression. How can we not see it from the above description?
In fact, the arithmetic encoding uses the thought of dividing zero into Integers to represent decimal binary digits. We really cannot accurately represent a single decimal character, but we can represent it in many character sets, only some margin of error in the last digit is allowed.
Based on the preceding simple example, each input symbol adjusts the probability distribution table and limits the number of decimals to be output to a smaller range. The limit on the output interval is the key to the problem. For example, when we enter the first character B, the output interval is limited to 0.33330.6667, we cannot determine whether the output value is 3, 4, 5, or 6. That is to say, the encoding length of B is smaller than one decimal bit (note that we will explain it in decimal format, which is different from that in binary format ), at the moment, we do not determine any bit of output information. We will continue to enter the following characters. After the third character C is input, our output range is limited to 0.63340.6667. We finally know that the first (decimal) of the output decimal is 6, however, we still cannot know the second digit, that is, the encoding length of the first three characters is between 1 and 2. After all the characters are entered, our output range is 0.63900.6501. We never get the exact information about the second digit. Now we understand that all the four characters are output, we only need a few decimal digits at. Our only choice is to output two decimal digits 0.64. In this way, we output all the information with an error of no more than one decimal bit, which is very close to the entropy value, in order to better integrate with the following courses, the above example uses a zeroorder adaptive model, and there is still a gap between the result and the real entropy value ).
How long is the decimal number?
You must have thought that if the information is rich in content, the number of decimals to be output will be very long and long. How can we express such a long decimal number in the memory?
In fact, there is no need to store the entire decimal point to be output in the memory. As we can see from the above example, during encoding, We will continuously get information about the decimals to be output. Specifically, when we limit the range to 0.63900.6501, we know that the first decimal (decimal) to be output must be 6, then we can completely remove 6 from the memory and continue our compression process between 0.390 and 0.501. There will never be a very long decimal number in the memory. The same is true when binary is used. With the compression, we will determine whether the next binary bit to be output is 0 or 1, and then output the bit and reduce the length of the small number in memory.
How to Implement static models?
We know that the simple example above uses an adaptive model. How can we implement a static model? It is actually very simple. For information bccb, we can find that there are only two characters in it. The probability distribution is pB = 0.5, Pc = 0.5. We do not need to update this probability distribution during the compression process. Each interval is divided according to this distribution. In the previous example, the interval is evenly divided. In this way, our compression process can be simply expressed:
Output Range lower limit output range upper limit  compression before 0.0 1.0 input B 0.0 input C 0.5 input C 0.25 input C 0.5 input B 0.375
We can see that the final output range is between 0.3750.4375, and even a decimal bit is not fixed. That is to say, the entire information cannot use a decimal bit. If we use binary to represent the above process, we will find that we can be very close to the entropy value of this information (some readers may have figured out, the entropy value of this information is 4 binary digits ).
Why use adaptive models?
Since we can use a static model to get close to the entropy value, why do we need to adopt an adaptive model?
You must know that the static model cannot adapt to the diversity of information. For example, the probability distribution obtained above cannot be used on all the information to be compressed. In order to extract the information correctly, we must consume a certain amount of space to save the probability distribution calculated by the static model. Saving the space used by the model will keep us away from entropy. Second, the static model requires statistics on the character distribution in the information before compression. This statistical process will consume a lot of time, making it slower to compress the arithmetic code.
In addition, the most important thing is that for a long piece of information, the probability of a symbol calculated by the static model is the probability that the symbol appears in the entire information, the adaptive model can calculate the probability of a symbol appearing in a certain part or the probability of a symbol appearing relative to a certain context. In other words, the probability distribution obtained from the adaptive model will help to compress information (it can be said that the information entropy of the adaptive model based on context is established at a higher probability level, and its total entropy is smaller ), the compression results of a good contextbased adaptive model will far exceed those of a static model.
Order of Adaptive Model
We usually use the term "order" to distinguish different adaptive models. In the example at the beginning of this chapter, we use a zerolevel adaptive model. That is to say, this example calculates the probability of a symbol appearing in the input information without considering any context information.
If we change the model into the probability of occurrence of a statistical symbol after a specific symbol, the model becomes a level 1 context adaptive model. For example, we want to encode an English text. We have already encoded 10000 English characters, the character we just encoded is t, and the next character to be encoded is H. In the previous encoding process, we have found that the first 10000 characters contain 113 letters T, of which 47 are followed by the letter H. We can conclude that the occurrence frequency of character h after character T is 47/113. We use this frequency to encode character H, which requireslog2 (47/113) = 1.266 characters.
Compared with the 0level adaptive model, if H appears 82 times in the first 10000 characters, the probability of the character H is 82/10000. We use this probability to encode H, log2 (82/10000) = 6.930 bits are required. The advantage of considering context factors is obvious.
We can further expand this advantage. For example, the first two characters of the encoding character H are GT, and the probability of h following the encoding character H is 80%, we can encode the output character h with only 0.322 characters. In this case, the model we use is called the Level 2 context adaptive model.
The ideal scenario is to use a thirdlevel adaptive model. At this time, if combined with arithmetic encoding, the information compression effect will reach an astonishing level. The system space and time required to adopt a higherlevel model are unacceptable at least for the moment. Most applications that use arithmetic compression use adaptive models of level 2 or level 3.
Functions of escape codes
To use the arithmetic coding algorithm of the adaptive model, you must consider how to encode the context that never appeared before. For example, in a level 1 context model, 256*256 = 65536 contexts may need to calculate the probability of occurrence, because all the characters between 0 and 255 may appear after any of the 0255 characters. When we face a context that never appears (for example, if we encode character B, we want to encode character D, and before that, d never appears after character B ), how can we determine the probability of a character?
A simple method is to assign a count of 1 occurrences to all possible contexts before compression starts. If a BD combination is encountered during compression, we believe that the number of times d appears after B is 1, and the probability can be correctly encoded. The problem with this method is that the characters in a certain context have a relatively small frequency before compression starts. For example, for the Level 1 context model, before compression, the frequency of any character is artificially set to 1/65536. According to this frequency, each character must be 16bit encoded at the start of compression, the compression effect will be improved gradually only when frequent characters occupy a large space on the frequency distribution chart. For level 2 or level 3 context models, the situation is even worse. We need to waste a lot of space for the vast majority of contexts that never appear.
We solve this problem by introducing the "Escape code. The "Escape code" is a special mark mixed in the compressed data stream. It is used to notify the extract program that the next context has never appeared before and requires lowlevel context encoding.
For example, in the Level 3 context model, we have just encoded ght and the next character to be encoded is A. Before that, Character A is not followed by ght, the compression program outputs the escape code and then checks the Level 2 context table to check the number of times a appears after HT. If a appears after HT, the probability in the Level 2 context table is a encoding. Otherwise, the escape code is output to check the level 1 context table. If the level 1 context table still cannot be found, the escape code is output, go to the lowest level 0 context table to check whether character a has been present before. If character a has never been present before, go to a special "escape" context table, this table contains all symbols ranging from 0 to 255. Each symbol has a count of 1 and will never be updated, any symbols that do not appear in a higherorder context can be returned here and encoded at a 1/256 frequency.
The introduction of the "Escape code" frees us from the context troubles that have never occurred before, and enables the model to quickly adjust to the optimal position based on the changes in input data, and quickly reduce the number of digits required for high probability symbol encoding.
Storage space problems
In the implementation of the highorder context model of arithmetic coding, the demand for memory is a very tricky problem. Because we must keep the count of existing contexts, and there may be so many context types in the higherlevel context model, the data structure design will directly affect the algorithm implementation success.
In the level 1 context model, it is feasible to use an array to calculate the number of occurrences. However, for Level 2 or level 3 context models, the array size increases exponentially, the memory of the existing computer cannot meet our requirements.
A clever way is to store all the context that appears in the tree structure. The higherorder context is always based on the lowerorder context. we store the level 0 context table in an array. Each array element contains a pointer to the corresponding level 1 context table, the Level 1 context table contains a pointer to the Level 2 context table ...... The entire context tree is formed. Only the context that appears in the tree has the allocated node. The context that does not appear does not need to occupy the memory space. In each context table, you do not need to save the count of all 256 characters. Only the characters that appear after the context have the Count value. Therefore, we can minimize the space consumption.
Resources
For the specific design and implementation of arithmetic compression, see the example program below.
ArithN is provided by Mark Nelson of League for programming freedom and compiled and debugged by Wang Benben in Visual C ++ 5.0.
ArithN contains the Visual C ++ project arithn. DSP and arithnexpand. DSP, which correspond to the compression and decompression programs an.exe and ane.exe respectively.
ArithN is a generalpurpose compression and decompression program that can be used to specify the order of the Nlevel Context Adaptive Arithmetic encoding in the command line. As it is used as a Tutorial example, for clarity, in some places, efficiency optimization is not intentionally performed.
All source programs are packaged in the file. Net/wangyg/Tech/Benben/src/ArithN.zip "> ArithN.zip.
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/pyramide/archive/2010/01/12/5172334.aspx