First, the arithmetic coding definition
It is a non-grouping coding algorithm. It is based on the whole sequence, using recursive form of continuous coding. Instead of mapping a single source symbol into a code word, the symbol for the entire input sequence is mapped to a small interval within the interval [0 1] on the real axis, and a representative binary decimal is selected as the actual encoded output.
The arithmetic code differs from the Hoffmann code, which is a non-grouping (not block) code . It starts from the full sequence and considers the relationship between symbols to encode. Arithmetic coding takes advantage of the concept of cumulative probabilities . The main coding method of arithmetic code is to calculate the interval corresponding to the sequence of input signal source symbols . Because in the encoding process, each input symbol for multiplication and addition operations, so called this encoding method is an arithmetic code.
Second, the arithmetic code coding
set input symbol stringstaken from the symbol setS={a1, a2, a3,... .., am},P (aI) ={p1, P2, P3,... .., Pm},sfollowed by symbolaIextending into symbolic stringsSAI, the iterative relationship of arithmetic coding is:
1" code word refresh: c ( sa Span style= "font-family: ' Arial Narrow '; Vertical-align:sub ">i ) =c (s) +p ( a i ) A (s)
2) interval Refresh: A (sai) =p (ai) A (s)
Symbol Cumulative probability:
Initial:
code Word calculation of arithmetic code
Through the calculation of the cumulative distribution function of the signal source symbol sequence, the interval is divided into many cells, and the different source symbol sequences correspond to the different intervals [F (s), F (s) +p (s)). It is advisable to represent this sequence in a small interval.
Encoding Method: Writes the cumulative distribution function of the symbol sequence to the decimal number of the decimal place,
If there is a mantissa in the back, carry it to the K -position ., so get a number C, and make K meet:
Example:
Iv. examples
[Example]Suppose the source symbol is{A, B, C, d}, the probabilities of these symbols are respectively{0.1, 0.4, 0.2, 0.3}, theInput message sequencecadacdbfor arithmetic coding.
Solution: According to these probabilities, the interval [0, 1) can be divided into 4 sub-intervals:[0, 0.1), [0.1, 0.5], [0.5 , 0.7), [0.7 , 1) . The information can be synthesized in the table:
The first symbol to be entered when encoding is C, and the encoding range to find it is [0.5, 0.7]. Since the encoding range of the second symbol A in the message is [0, 0.1], it takes the first one-tenth of [0.5, 0.7) as a new interval [0.5, 0.52]. And so on, the 3rd symbol D is encoded with a new interval of [0.514, 0.52), .... The encoded output of a message can be any number in the last interval .
We can calculate according to the code word: K takes 17.
In turn, the fractional 0.5143876 of the final output is converted to binary: 0.10000011101011110
The final result is: 10000011101011110
Specific implementation can refer to my code, thank you!!!
Using C to implement arithmetic coding on Linux (iii)--Explanation of arithmetic coding theory