Arithmetic Coding wiki introduction http://zh.wikipedia.org/wiki/%E7% AE %97%E6%9C%AF%E7%BC%96%E7%A0%81
I found a good PPT on the Internet.
Here, too.
In arithmetic coding, a message or a symbolic string (sequence) is represented as an interval between 0 and 1, that is, a string of symbols is directly encoded into [0, 1) A floating point decimal point in the interval. Before transmitting any symbol string (Message), set the complete range of the symbol string to [0, 1 ). When a symbol is processed, the range is narrowed and the interval decreases based on the range assigned to the symbol. When the string sequence is longer, encoding indicates that the smaller the interval, the more digits required for the interval until all the symbol strings are encoded. The process of arithmetic coding is actually the process of separating the code Intervals Based on the occurrence probability of the symbol string of the information source.
Encoding Process
If you want to encode the string "State Tree" from a source with 10 symbols, the symbol string has the following probability distribution.
The [0, 1) interval is divided by the probability of each symbol. Obviously, each symbol has a corresponding subinterval, as shown in the allocated range of the 10 characters used here.
The arithmetic coding process for 'State stree 'is as follows:
(1) During initialization, the split range = high-low = [0, 1) = 1-0 = 1. The lower and higher values of the next range are calculated by the following formula:
Low = low + range × range low
High = low + range × range high
The low on the right of the equal sign is the low value of the range of the last encoded character. The low and high values of range low and range high are the low and high values of the probability range that has been given for this encoding.
(2) 1st characters of message s encoding: range low of S = 0.60, range high of S = 0.70, so the low and high of the next interval is:
Low = low + range × range low = 0 + 1 × 0.6 = 0.6
High = low + range × range high = 0 + 1 × 0.7 = 0.7
Range = high 1 low = 0.7 1 0.6 = 0.1
S returns the range [0, 1) [0.6, 0.7)
Note: The character "" indicates "split into" characters.
(3) encode the 2nd t character in the range [0.6, 0.7). Because the range low of T is 0.70, the range high is 1.00, so the next low, high:
Low = low + range × range low = 0.6 + 0.1 × 0.7 = 0.67
High = low + range × range high = 0.6 + 0.1 × 1.0 = 0.70
Range = high 1 low = 0.7 1 0.67 = 0.03
T will range [0.6, 0.7) [0.67, 0.70)
(4) encode 3rd characters a and separate them in the newly generated [0.67, 0.70. Because range low = 0.10 and range high = 0.2 of A, the next low and high are:
Low = low + range × range low = 0.67 + 0.03 × 0.1 = 0.673
High = low + range × range high = 0.67 + 0.03 × 0.2 = 0.676
Range = high 1 low = 0.676 1 0.673 = 0.003
Range A: [0.67, 0.7) [0.673, 0.676)
(5) encode 4th characters in T and separate them on the newly generated [0.673, 0.676. Because t's range low = 0.70 and range high = 1.00, the next low and high are:
Low = low + range × range low
= 0.673 + 0.003x0.7 = 0.6751
High = low + range × range high
= 0.673 + 0.003x1.0 = 0.676
Range = high 1 low = 0.676 1 0.6751 = 0.0009
T will range [0.673, 0.676) [0.6751, 0.676)
Similarly, we can obtain the following encoding ranges for the characters E, T, R, E, and E:
[0.67528, 0.67555 ),
[0.67528, 0.675307 ),
[0.6752989, 0.675307 ),
[0.67530295, 0.67530376 ),
[0.675303112, 0.675303355)
[0.6753031606, 0.6753032335 ).
Combine the encoded range:
We use 0.6753031606 to represent the string "State Tree" to achieve efficient coding. This is the basic idea of arithmetic coding.
The preceding arithmetic encoding interval segmentation process can be expressed.
Decoding process
After encoding, the lower bound value 0.6753031606 of the last subinterval is the unique encoding of the string "state tree. Then, during decoding, we can determine which character can have the space where the encoded message falls to find the first character in the message. Since 0.6753031606 falls on the question [0.6, 0.7], the 1st symbol is s immediately.
After S is solved, because we know the upper and lower bounds of S range, we first remove the lower bound value of S 0.6, get 0.075303606, then, use range = 0.1 of S to remove the obtained 0.0753031606, get 0.753031606, and then find the range [0.753031606, 0.7) in which 1.0 falls ). The 2nd symbol is T.
Remove the lower bound value of T 0.67, get 0.0053031606, and then use the range of T = 0.03 except 0.0053031606, get 0.17677202, find the character a in the range of 0.17677202,
In this way, the exact decoding of the string "State Tree" can be obtained.
In a simple simulation of this process, I took the 10 letters A-J and only encoded so much. Each probability is assumed to be the same as 1/10. Simple double data is used to store the encoded results. During debugging, it is found that the double type is too inaccurate, resulting in a large error and failure, later, I used the double character in Java to write the same data. It seems that this is not acceptable. Due to the inaccuracy of the double, the number of digits before the basic encoding cannot be determined, so I added a single digit after encoding so that the size of the encoding is not between 0 and 1, but I am just a simulation, so I can understand this truth.
# Include <stdio. h> # include <string. h> # include <math. h> # include <malloc. h>/** encoding (a-j) **/Double bianma (char * s) {int Len = strlen (s); int I; double low; // encode the first character low = (* s-'A') * 0.1; // encode the remaining characters for (I = 1; I <Len; I ++) {LOW = low + (* (S + I)-'A') * 0.1) * POW (0.1, I);} return low + Len ;} /** decoding **/char * jiema (double S) {int temp, I, Len; char * STR; Len = (INT) S; // source code length: S-len; STR = (char *) malloc (sizeof (char) * Len); for (I = 0; I <Len; I ++) {S = S * 10; temp = (INT) s; if (I = len-1) temp = temp + 1; // correct the error that double brings, poor results * (STR + I) = temp + 'a'; S = s-temp;} * (STR + I) = '\ 0'; return STR ;} int Init () {int C; char * STR; double S; STR = (char *) malloc (50 ); printf ("=============================== arithmetic coding ====== ===========================================\ N "); printf ("only supports abcdefghij encoding,"); printf ("select \ N1. encoding \ N2. decoding \ N3. exit \ n"); scanf ("% d ", & C); getchar (); Switch (c) {Case 1: printf ("Enter the character (a-j): \ n"); scanf ("% s ", str); s = bianma (STR); printf ("Encoded: % lf \ n", S); getchar (); Init (); break; case 2: printf ("Enter the code to be decoded: \ n"); scanf ("% lf", & S); STR = jiema (s ); printf ("decoded as: % s \ n", STR); getchar (); Init (); break; Case 3: Return 0; default: printf ("Please enter \ n" correctly); getchar (); Init (); break;} int main () {Init (); return 1 ;}