C language based on Hoffmann Coding
?
1. Hoffman encoding description
The Harman tree, which is the optimal binary tree with the minimum length of the weight path, is often used for data compression. In computer information processing, the "Harman encoding" is a consistent encoding method (also known as the "entropy encoding method") for data lossless compression. This term refers to the use of a special encoding table to encode source characters (such as a symbol in a file. The special feature of this encoding table is that it is established based on the estimated probability of each source character (characters with high probability use short encoding, if the probability is low, a long encoding is used, which reduces the average expected length of the encoded string to achieve lossless data compression ). This method was developed by David. A. Huffman. For example, in English, e has a high probability, while z has the lowest probability. When a piece of English is compressed using the Harman encoding, e is very likely to be represented by a single bit, while z may spend 25 digits (not 26 ). Each English letter occupies one byte, that is, eight digits. E uses 1/8 of the general encoding length, and z uses more than three times. If the probability of occurrence of each letter in English is estimated accurately, the lossless compression ratio can be greatly increased.
2. Problem Description
Before coding Hoffmann, you must first count the word frequency of each word, that is, the number of occurrences. For example:
1. Sort the occurrences of all letters in ascending order, such
2. Each letter represents a terminal node (leaf node. o. r. g.E. the occurrence frequency of each letter in five T letters. The minimum two letter frequencies are added to form a new node. As shown in, it is found that F and O are the least frequent, SO 2 + 3 = 5 is added, F and O are formed into a tree, F is the left node, O is the right node, (FO) it is the root node, and the value of each node is its frequency of appearance (the frequency of FO is 5)
3. Compare the 5. R. G.E. T and find that the frequency between R and G is the minimum. Therefore, 4 + 4 = 8 is added to form a new node.
4. Compare 5.8.E.T and find that the frequency between 5 and E is the smallest, so 5 + 5 = 10 is added. Therefore, FO is used as the left node, E is used as the right node, and FOE is used as the root node.
5. Compare 8.10.T and find that the frequency of 8 and T is the minimum. Therefore, 8 + 7 = 15 is added, RG is used as the left node, T is used as the right node, and RGT is used as the root node.
6. There are 10.15 objects left at the end, and there are no comparable objects. Add 10 + 15 = 25, FOE as the left node, and RGT as the right node.
The root node does not have a value. Each left subnode has a value of 0 and the right subnode has a value of 1. Each letter is traversed from the root node. The values along the way constitute the encoding:
First, select a text to count the number of times each character appears, and form the following array:
Typedef struct FrequencyTreeNode {
Int freq;
Char c;
Struct FrequencyTreeNode * left;
Struct FrequencyTreeNode * right;
} FrequencyTreeNodeStruct, * pFrequencyTreeNodeStruct;
Then, the obtained array frequencies is sorted, and a binary search tree is formed by freq in the ascending order. FrequencyTreeNodeStruct is used to find the smallest node in the binary search tree and delete it from the tree, take the smallest node and two subnodes to form a new tree. The root node c is 0, and freq is the sum of the two subnodes. Add it to frequencies and sort it. Repeat this step, until there is only one node in frequencies, the node is the root node of the Huffman coding tree.
The short type is used to encode each character according to the preceding rules. Then, the text is translated into Huffman coding and decoded using the Huffman coding tree to verify the correctness of the encoding.
3. Code Implementation
 # Include
 # Define n 5 // number of leaves
 # Define m (2 * N1) // total number of nodes
 # Define maxval 10000.0
 # Define maxsize 100 // The maximum number of digits of the Harman Encoding


 // Define struct
 Typedef struct FrequencyTreeNode {
 Int freq;
 Char c;
 Struct FrequencyTreeNode * left;
 Struct FrequencyTreeNode * right;
 } FrequencyTreeNodeStruct, * pFrequencyTreeNodeStruct;


 FrequencyTreeNodeStruct frequencies [MAXALPABETNUM];


 Typedef struct
 {
 Char bits [n]; // Bit String
 Int start; // the start position of the encoded inplace string.
 Char ch; // character
 } Codetype;


 // Read the file content, statistical characters, and frequency of occurrence
 Void readTxtStatistics (char * fileName)
 {
 Unsigned int nArray [52] = {0 };
 Unsigned int I, j;
 Char szBuffer [MAXLINE];
 Int k = 0;
 // Read the file content
 FILE * fp = fopen (fileName ,);
 If (fp! = NULL)
 {/* Read the file content, first count the letters and the number of occurrences */
 While (fgets (szBuffer, MAXLINE, fp )! = NULL)
 {
 For (I = 0; I <strlen (szBuffer); I ++)
 {
 If (szBuffer [I] <= 'Z' & szBuffer [I]> = 'A ')
 {
 J = szBuffer [I]'A ';
 }
 Else if (szBuffer [I] <= 'Z' & szBuffer [I]> = 'A ')
 {
 J = szBuffer [I]'A' + 26;
 }
 Else
 Continue;
 NArray [j] ++;
 }
 }


 // Assign the value to the frequencies Array
 For (I = 0, j = 'a'; I <52; I ++, j ++)
 {
 If (nArray [I]> 0)
 {
 /*****/
 Frequencies [k]. c = j;
 Frequencies [k]. freq = nArray [I];
 Frequencies [k]. left = NULL;
 Frequencies [k]. right = NULL;
 K ++;
 Printf (% c: % d \ n, j, nArray [I]);
 }
 If (j = 'Z ')
 J = 'a'1;
 }
 }
 }


 // Create a user tree
 Void huffMan (frequencies tree []) {
 Int I, j, p1, p2; // p1, p2 respectively remember the subscript of the two root nodes with the minimum weight and the minimum weight
 Float small1, small2, f;
 Char c;
 For (I = 0; I
 {
 Tree [I]. parent = 0;
 Tree [I]. lchild =1;
 Tree [I]. rchild =1;
 Tree [I]. weight = 0.0;
 }
 Printf ([read characters and weights of the first % d nodes in sequence (separated by spaces)] \ n, n );


 // Read the characters and weights of the First n nodes
 For (I = 0; I
 {
 Printf (enter the "% d" character and the weight, I + 1 );
 Scanf (% c % f, & c, & f );
 Getchar ();
 Tree [I]. ch = c;
 Tree [I]. weight = f;
 }
 // Merge n1 times to genern11 new nodes
 For (I = n; I
 {
 P1 = 0; p2 = 0;
 // Maxval is the maximum value of the float type
 Small1 = maxval; small2 = maxval;
 // Select the root node with the smallest weight
 For (j = 0; j
 {
 If (tree [j]. parent = 0)
 If (tree [j]. weight
 {
 Small2 = small1; // change the minimum permission, subprivilege, and corresponding location
 Small1 = tree [j]. weight;
 P2 = p1;
 P1 = j;
 }
 Else if (tree [j]. weight
 {
 Small2 = tree [j]. weight; // change the subpermission and Position
 P2 = j;
 }
 Tree [p1]. parent = I;
 Tree [p2]. parent = I;
 Tree [I]. lchild = p1; // the smallest root node is the left child of the new node.
 Tree [I]. rchild = p2; // The SubPermission root node is the right child of the new node
 Tree [I]. weight = tree [p1]. weight + tree [p2]. weight;
 }
 }
 }


 // Find the Harman Encoding Based on the Harman tree. The code [] is the Harman encoding, and the tree [] is the known Harman tree.
 Void huffmancode (codetype code [], frequencies tree [])
 {
 Int I, c, p;
 Codetype cd; // buffer variable
 For (I = 0; I
 {
 Cd. start = n;
 Cd. ch = tree [I]. ch;
 C = I; // backtracing from the leaf node
 P = tree [I]. parent; // tree [p] is the parent of tree [I]
 While (p! = 0)
 {
 Cd. start ;
 If (tree [p]. lchild = c)
 Cd. bits [cd. start] = '0'; // tree [I] is the left subtree, generating code '0'
 Else
 Cd. bits [cd. start] = '1'; // tree [I] is the right subtree, generating code '1'
 C = p;
 P = tree [p]. parent;
 }
 Code [I] = cd; // encode the I + 1 character and save it to code [I]
 }
 }




 // Decodes data based on the Harman tree
 Void decode (hufmtree tree [])
 {
 Int I, j = 0;
 Char B [maxsize];
 Char endflag = '2'; // 2 indicates the end of the message.
 I = s1; // search from the root node
 Printf (enter the sent encoding (ending with '2 ):);
 Gets (B );
 Printf (encoded characters );
 While (B [j]! = '2 ')
 {
 If (B [j] = '0 ')
 I = tree [I]. lchild; // move to the left subnode
 Else
 I = tree [I]. rchild; // go to the right subnode
 If (tree [I]. lchild =1) // tree [I] is a leaf node
 {
 Printf (% c, tree [I]. ch );
 I = s1; // return to the root node
 }
 J ++;
 }
 Printf (\);
 If (tree [I]. lchild! =1 & B [j]! = '2') // read the text, but it has not reached the leaf node
 Printf (\ ERROR \ n); // The input text is incorrect.
 }




 Void main ()
 {
 Printf ( practice of the Harman encoding  \ n );
 Printf (a total of % d characters \ n, n );
 Frequencies tree [m];
 Codetype code [n];
 Int I, j; // cyclic variable
 HuffMan (tree); // create a userdefined tree
 Huffmancode (code, tree); // find the Harman code based on the Harman tree
 Printf ([output the Heman encoding for each character] \ n );
 For (I = 0; I
 {
 Printf (% c:, code [I]. ch );
 For (j = code [I]. start; j
 Printf (% c, code [I]. bits [j]);
 Printf (\);
 }
 Printf ([read and encode the content] \ n );
 // Start Encoding
 Decode (tree );
 }