Arithmetic Coding principle

Source: Internet
Author: User

http://blog.csdn.net/adam_tu/article/details/7696455

As early as 1948, Shannon put forward the source symbol according to the probability of its occurrence descending order, using the binary value of the cumulative probability of symbolic sequence as the core source code, and theoretically demonstrated its superiority. In 1960, Peter Elias found that there was no need to sort, as long as the coding and decoding end used the same symbol order, the concept of arithmetic coding was proposed. Elias did not announce his findings, because he knew that arithmetic coding, though mathematically established, could not be realized in practice. 1976, R. Pasco and J. The Rissanen uses fixed-length registers to realize the finite-precision arithmetic coding respectively. 1979 Rissanen and G. G. Langdon worked together to systematize the arithmetic code, and in 1981 the binary code was implemented. 1987 Witten and others published a practical arithmetic coding program, namely CACM87 (later used for ITU-T's H.263 video compression standard). Over the same period, IBM published the well-known Q-encoder (later used for JPEG and JBIG image compression standards). Since then, the arithmetic coding has quickly gained extensive attention.

The basic principle of arithmetic coding is that the encoded message represents an interval between Narimi number 0 and 1 (Interval), the longer the message, the smaller the encoding means, and the more bits is required to indicate this interval.

The arithmetic code uses two basic parameters: the probability of the sign and its encoding interval. The probability of the source symbol determines the efficiency of the compression coding, and also determines the spacing of the CITIC source symbols in the encoding process, which are contained between 0 and 1. The interval in the encoding process determines the output after the symbol is compressed.

The arithmetic encoding steps for a given sequence of events are as follows:

(1) At the beginning of the encoder, the "current Interval" [L, H) is set to [0,1].

(2) For each event, the encoder is processed by steps (a) and (b)

(a) The encoder divides the "current interval" into sub-intervals, one for each event.

(b) The size of a sub-interval is proportional to the probability of the next occurrence of the event, and the encoder selection sub-interval corresponds to the next exactly occurring event and makes it the new "current Interval".

(3) The bottom boundary of the "current interval" of the last output is the arithmetic encoding of the given sequence of events.

Set Low and high to indicate the bottom and upper bounds of the current interval,coderange The length of the encoding interval,lowrange (symbol), and highrange (symbol ) represents the bottom and upper bounds of the initial interval allocated for the event symbol , respectively. The implementation of the above procedure can be described by pseudo-code as follows:

Set Low to 0

Set High to 1

While there is input symbols do

Take a symbol

Coderange = High–low

High = low + Coderange *highrange (symbol)

Low = low + Coderange * Lowrange (symbol)

End of While

Output Low

The arithmetic code decoding process is described in pseudo-code as follows:

Get encoded number

Do

Find symbol whose range straddles the encoded number

Output the symbol

Range = Symbo. Lowvalue–symbol. Highvalue

Substracti symbol. Lowvalue from encoded number

Divide encoded number by range

Until no more symbols

The encoding and decoding process of the arithmetic encoder can be illustrated and explained using examples.

Example 1: Suppose the source symbol is {A, B, C, D}, the probabilities of these symbols are {0.1, 0.4, 0.2,0.3}, according to these probabilities can divide the interval [0, 1] into 4 sub-intervals: [0, 0.1], [0.1, 0.5], [0.5, 0.7], [0.7, 1], where [x, Y] represents a semi-open interval, that is, contains x does not contain Y. The above information can be synthesized in the table 03-04-1.

Table 03-04-1 source symbol, probability and initial encoding interval

Initial encoding interval

width= "the" "~" >

[0, 0.1)

width= 102 "

symbol

a

b

c

D 

probability

0.1

0.4

0.2

td>

0.3 

[0.1, 0.5)

[0.5, 0.7)

[, 1] 

If the input to a binary message sequence is: c A D a C D B. The first symbol entered in the code is C, and the encoding range to find it is [0.5,0.7]. Since the encoding range of the second symbol A in the message is [0, 0.1], it takes the first one-tenth of [0.5, 0.7] as a new interval [0.5,0.52]. And so on, the 3rd symbol D is encoded with a new interval of [0.514, 0.52], encoding the 4th symbol A, the new interval is [0.514, 0.5146], .... The encoded output of a message can be any number in the last interval. The entire encoding process is shown in 03-04-1.

Figure 03-04-1 Arithmetic Coding Process Example

The whole process of encoding and decoding of this example is represented in table 03-04-2 and table 03-04-3 respectively.

Table 03-04-2 Encoding Process

Steps

Enter symbol

Encoding interval

Coded judgments

1

C

[0.5, 0.7]

Interval range of symbols [0.5, 0.7]

2

A

[0.5, 0.52]

[0.5, 0.7] The first 1/10 of the interval

3

D

[0.514, 0.52]

[0.5, 0.52] The last 1/10 of the interval

4

A

[0.514, 0.5146]

[0.514, 0.52] The first 1/10 of the interval

5

C

[0.5143, 0.51442]

[0.514, 0.5146] interval of the fifth 1/10 start, two x 1/10

6

D

[0.514384, 0.51442]

[0.5143, 0.51442] The last 3 of the interval 1/10

7

B

[0.5143836, 0.514402]

[0.514384,0.51442] Interval of 4 1/10, starting from 1th 1/10

8

Select a number from [0.5143876, 0.514402] as output: 0.5143876

Table 03-04-3 decoding process

Steps

Interval

decoding symbols

Decoding judgment

1

[0.5, 0.7]

C

0.51439 in the interval [0.5, 0.7)

2

[0.5, 0.52]

A

0.51439 1th of the interval [0.5, 0.7) 1/10

3

[0.514, 0.52]

D

0.51439 7th of the interval [0.5, 0.52) 1/10

4

[0.514, 0.5146]

A

0.51439 1th of the interval [0.514, 0.52] 1/10

5

[0.5143, 0.51442]

C

0.51439 5th of the interval [0.514, 0.5146] 1/10

6

[0.514384, 0.51442]

D

0.51439 7th of the interval [0.5143, 0.51442] 1/10

7

[0.51439, 0.5143948]

B

0.51439 1th of the interval [0.51439,0.5143948] 1/10

8

Decoding message: c A D a C D B

In the above example, we assume that both the encoder and the decoder know the length of the message, so the decoding process of the decoder does not run indefinitely. In fact, in the decoder need to add a special terminator, when the decoder sees the terminator to stop decoding.

There are several problems to be aware of in arithmetic coding:

· Since the accuracy of the actual computer cannot be infinitely long, an obvious problem is that there is overflow in the operation, but most machines have 16, 32, or 64-bit precision, so this problem can be solved using the proportional scaling method.

· The arithmetic encoder produces only one code word for the entire message, which is a real number in the interval [0,1], so the decoder cannot decode until it receives all the bits that represent the real number.

· Arithmetic coding is also an error-sensitive encoding method, and if an error occurs, it causes the entire message to be translated incorrectly.

The arithmetic code can be static or adaptive. In static arithmetic coding, the probability of the source symbol is fixed. In adaptive arithmetic coding, the probability of the source symbol is dynamically modified according to the frequency of the symbol appearing in the code, and the process of estimating the probability of the source symbol during coding is called modeling. Dynamic arithmetic coding needs to be developed because it is difficult and impractical to know the exact source probabilities in advance. When compressing a message, we cannot expect an arithmetic encoder to get maximum efficiency, and the most efficient way to do this is to estimate the probability in the coding process. Therefore, the dynamic modeling becomes the key to determine the compression efficiency of the encoder.

In addition, there are copyright issues in the use of arithmetic codes. Some variants of the arithmetic coding of the JPEG standard description belong to the patents owned by IBM, T and Mitsubishi. These companies must be licensed to use JPEG arithmetic encoding legally.

Arithmetic Coding principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.