http://blog.csdn.net/adam_tu/article/details/7696455
As early as 1948, Shannon put forward the source symbol according to the probability of its occurrence descending order, using the binary value of the cumulative probability of symbolic sequence as the core source code, and theoretically demonstrated its superiority. In 1960, Peter Elias found that there was no need to sort, as long as the coding and decoding end used the same symbol order, the concept of arithmetic coding was proposed. Elias did not announce his findings, because he knew that arithmetic coding, though mathematically established, could not be realized in practice. 1976, R. Pasco and J. The Rissanen uses fixed-length registers to realize the finite-precision arithmetic coding respectively. 1979 Rissanen and G. G. Langdon worked together to systematize the arithmetic code, and in 1981 the binary code was implemented. 1987 Witten and others published a practical arithmetic coding program, namely CACM87 (later used for ITU-T's H.263 video compression standard). Over the same period, IBM published the well-known Q-encoder (later used for JPEG and JBIG image compression standards). Since then, the arithmetic coding has quickly gained extensive attention.
The basic principle of arithmetic coding is that the encoded message represents an interval between Narimi number 0 and 1 (Interval), the longer the message, the smaller the encoding means, and the more bits is required to indicate this interval.
The arithmetic code uses two basic parameters: the probability of the sign and its encoding interval. The probability of the source symbol determines the efficiency of the compression coding, and also determines the spacing of the CITIC source symbols in the encoding process, which are contained between 0 and 1. The interval in the encoding process determines the output after the symbol is compressed.
The arithmetic encoding steps for a given sequence of events are as follows:
(1) At the beginning of the encoder, the "current Interval" [L, H) is set to [0,1].
(2) For each event, the encoder is processed by steps (a) and (b)
(a) The encoder divides the "current interval" into sub-intervals, one for each event.
(b) The size of a sub-interval is proportional to the probability of the next occurrence of the event, and the encoder selection sub-interval corresponds to the next exactly occurring event and makes it the new "current Interval".
(3) The bottom boundary of the "current interval" of the last output is the arithmetic encoding of the given sequence of events.
Set Low and high to indicate the bottom and upper bounds of the current interval,coderange The length of the encoding interval,lowrange (symbol), and highrange (symbol ) represents the bottom and upper bounds of the initial interval allocated for the event symbol , respectively. The implementation of the above procedure can be described by pseudo-code as follows:
Set Low to 0
Set High to 1
While there is input symbols do
Take a symbol
Coderange = High–low
High = low + Coderange *highrange (symbol)
Low = low + Coderange * Lowrange (symbol)
End of While
Output Low
The arithmetic code decoding process is described in pseudo-code as follows:
Get encoded number
Do
Find symbol whose range straddles the encoded number
Output the symbol
Range = Symbo. Lowvalue–symbol. Highvalue
Substracti symbol. Lowvalue from encoded number
Divide encoded number by range
Until no more symbols
The encoding and decoding process of the arithmetic encoder can be illustrated and explained using examples.
Example 1: Suppose the source symbol is {A, B, C, D}, the probabilities of these symbols are {0.1, 0.4, 0.2,0.3}, according to these probabilities can divide the interval [0, 1] into 4 sub-intervals: [0, 0.1], [0.1, 0.5], [0.5, 0.7], [0.7, 1], where [x, Y] represents a semi-open interval, that is, contains x does not contain Y. The above information can be synthesized in the table 03-04-1.
Table 03-04-1 source symbol, probability and initial encoding interval
Initial encoding interval
width= "the" "~" >
[0, 0.1)
symbol |
a |
b |
c |
D  |
probability |
0.1 |
0.4 |
0.2 td> |
0.3 |
width= 102 "
[0.1, 0.5) |
[0.5, 0.7) |
[, 1] |
If the input to a binary message sequence is: c A D a C D B. The first symbol entered in the code is C, and the encoding range to find it is [0.5,0.7]. Since the encoding range of the second symbol A in the message is [0, 0.1], it takes the first one-tenth of [0.5, 0.7] as a new interval [0.5,0.52]. And so on, the 3rd symbol D is encoded with a new interval of [0.514, 0.52], encoding the 4th symbol A, the new interval is [0.514, 0.5146], .... The encoded output of a message can be any number in the last interval. The entire encoding process is shown in 03-04-1.
Figure 03-04-1 Arithmetic Coding Process Example
The whole process of encoding and decoding of this example is represented in table 03-04-2 and table 03-04-3 respectively.
Table 03-04-2 Encoding Process
Steps |
Enter symbol |
Encoding interval |
Coded judgments |
1 |
C |
[0.5, 0.7] |
Interval range of symbols [0.5, 0.7] |
2 |
A |
[0.5, 0.52] |
[0.5, 0.7] The first 1/10 of the interval |
3 |
D |
[0.514, 0.52] |
[0.5, 0.52] The last 1/10 of the interval |
4 |
A |
[0.514, 0.5146] |
[0.514, 0.52] The first 1/10 of the interval |
5 |
C |
[0.5143, 0.51442] |
[0.514, 0.5146] interval of the fifth 1/10 start, two x 1/10 |
6 |
D |
[0.514384, 0.51442] |
[0.5143, 0.51442] The last 3 of the interval 1/10 |
7 |
B |
[0.5143836, 0.514402] |
[0.514384,0.51442] Interval of 4 1/10, starting from 1th 1/10 |
8 |
Select a number from [0.5143876, 0.514402] as output: 0.5143876 |
Table 03-04-3 decoding process
Steps |
Interval |
decoding symbols |
Decoding judgment |
1 |
[0.5, 0.7] |
C |
0.51439 in the interval [0.5, 0.7) |
2 |
[0.5, 0.52] |
A |
0.51439 1th of the interval [0.5, 0.7) 1/10 |
3 |
[0.514, 0.52] |
D |
0.51439 7th of the interval [0.5, 0.52) 1/10 |
4 |
[0.514, 0.5146] |
A |
0.51439 1th of the interval [0.514, 0.52] 1/10 |
5 |
[0.5143, 0.51442] |
C |
0.51439 5th of the interval [0.514, 0.5146] 1/10 |
6 |
[0.514384, 0.51442] |
D |
0.51439 7th of the interval [0.5143, 0.51442] 1/10 |
7 |
[0.51439, 0.5143948] |
B |
0.51439 1th of the interval [0.51439,0.5143948] 1/10 |
8 |
Decoding message: c A D a C D B |
In the above example, we assume that both the encoder and the decoder know the length of the message, so the decoding process of the decoder does not run indefinitely. In fact, in the decoder need to add a special terminator, when the decoder sees the terminator to stop decoding.
There are several problems to be aware of in arithmetic coding:
· Since the accuracy of the actual computer cannot be infinitely long, an obvious problem is that there is overflow in the operation, but most machines have 16, 32, or 64-bit precision, so this problem can be solved using the proportional scaling method.
· The arithmetic encoder produces only one code word for the entire message, which is a real number in the interval [0,1], so the decoder cannot decode until it receives all the bits that represent the real number.
· Arithmetic coding is also an error-sensitive encoding method, and if an error occurs, it causes the entire message to be translated incorrectly.
The arithmetic code can be static or adaptive. In static arithmetic coding, the probability of the source symbol is fixed. In adaptive arithmetic coding, the probability of the source symbol is dynamically modified according to the frequency of the symbol appearing in the code, and the process of estimating the probability of the source symbol during coding is called modeling. Dynamic arithmetic coding needs to be developed because it is difficult and impractical to know the exact source probabilities in advance. When compressing a message, we cannot expect an arithmetic encoder to get maximum efficiency, and the most efficient way to do this is to estimate the probability in the coding process. Therefore, the dynamic modeling becomes the key to determine the compression efficiency of the encoder.
In addition, there are copyright issues in the use of arithmetic codes. Some variants of the arithmetic coding of the JPEG standard description belong to the patents owned by IBM, T and Mitsubishi. These companies must be licensed to use JPEG arithmetic encoding legally.
Arithmetic Coding principle