C language based on Hoffmann Coding

Source: Internet
Author: User

C language based on Hoffmann Coding


?
1. Hoffman encoding description
The Harman tree, which is the optimal binary tree with the minimum length of the weight path, is often used for data compression. In computer information processing, the "Harman encoding" is a consistent encoding method (also known as the "entropy encoding method") for data lossless compression. This term refers to the use of a special encoding table to encode source characters (such as a symbol in a file. The special feature of this encoding table is that it is established based on the estimated probability of each source character (characters with high probability use short encoding, if the probability is low, a long encoding is used, which reduces the average expected length of the encoded string to achieve lossless data compression ). This method was developed by David. A. Huffman. For example, in English, e has a high probability, while z has the lowest probability. When a piece of English is compressed using the Harman encoding, e is very likely to be represented by a single bit, while z may spend 25 digits (not 26 ). Each English letter occupies one byte, that is, eight digits. E uses 1/8 of the general encoding length, and z uses more than three times. If the probability of occurrence of each letter in English is estimated accurately, the lossless compression ratio can be greatly increased.

2. Problem Description
Before coding Hoffmann, you must first count the word frequency of each word, that is, the number of occurrences. For example:

 

1. Sort the occurrences of all letters in ascending order, such

2. Each letter represents a terminal node (leaf node. o. r. g.E. the occurrence frequency of each letter in five T letters. The minimum two letter frequencies are added to form a new node. As shown in, it is found that F and O are the least frequent, SO 2 + 3 = 5 is added, F and O are formed into a tree, F is the left node, O is the right node, (FO) it is the root node, and the value of each node is its frequency of appearance (the frequency of FO is 5)

3. Compare the 5. R. G.E. T and find that the frequency between R and G is the minimum. Therefore, 4 + 4 = 8 is added to form a new node.

4. Compare 5.8.E.T and find that the frequency between 5 and E is the smallest, so 5 + 5 = 10 is added. Therefore, FO is used as the left node, E is used as the right node, and FOE is used as the root node.

5. Compare 8.10.T and find that the frequency of 8 and T is the minimum. Therefore, 8 + 7 = 15 is added, RG is used as the left node, T is used as the right node, and RGT is used as the root node.

6. There are 10.15 objects left at the end, and there are no comparable objects. Add 10 + 15 = 25, FOE as the left node, and RGT as the right node.

 

The root node does not have a value. Each left subnode has a value of 0 and the right subnode has a value of 1. Each letter is traversed from the root node. The values along the way constitute the encoding:

 

 

First, select a text to count the number of times each character appears, and form the following array:
Typedef struct FrequencyTreeNode {
Int freq;
Char c;
Struct FrequencyTreeNode * left;
Struct FrequencyTreeNode * right;
} FrequencyTreeNodeStruct, * pFrequencyTreeNodeStruct;

Then, the obtained array frequencies is sorted, and a binary search tree is formed by freq in the ascending order. FrequencyTreeNodeStruct is used to find the smallest node in the binary search tree and delete it from the tree, take the smallest node and two subnodes to form a new tree. The root node c is 0, and freq is the sum of the two subnodes. Add it to frequencies and sort it. Repeat this step, until there is only one node in frequencies, the node is the root node of the Huffman coding tree.

 

The short type is used to encode each character according to the preceding rules. Then, the text is translated into Huffman coding and decoded using the Huffman coding tree to verify the correctness of the encoding.


3. Code Implementation

  1. # Include
  2. # Define n 5 // number of leaves
  3. # Define m (2 * N-1) // total number of nodes
  4. # Define maxval 10000.0
  5. # Define maxsize 100 // The maximum number of digits of the Harman Encoding
  6.  
  7.  
  8. // Define struct
  9. Typedef struct FrequencyTreeNode {
  10. Int freq;
  11. Char c;
  12. Struct FrequencyTreeNode * left;
  13. Struct FrequencyTreeNode * right;
  14. } FrequencyTreeNodeStruct, * pFrequencyTreeNodeStruct;
  15.  
  16.  
  17. FrequencyTreeNodeStruct frequencies [MAXALPABETNUM];
  18.  
  19.  
  20. Typedef struct
  21. {
  22. Char bits [n]; // Bit String
  23. Int start; // the start position of the encoded in-place string.
  24. Char ch; // character
  25. } Codetype;
  26.  
  27.  
  28. // Read the file content, statistical characters, and frequency of occurrence
  29. Void readTxtStatistics (char * fileName)
  30. {
  31. Unsigned int nArray [52] = {0 };
  32. Unsigned int I, j;
  33. Char szBuffer [MAXLINE];
  34. Int k = 0;
  35. // Read the file content
  36. FILE * fp = fopen (fileName ,);
  37. If (fp! = NULL)
  38. {/* Read the file content, first count the letters and the number of occurrences */
  39. While (fgets (szBuffer, MAXLINE, fp )! = NULL)
  40. {
  41. For (I = 0; I <strlen (szBuffer); I ++)
  42. {
  43. If (szBuffer [I] <= 'Z' & szBuffer [I]> = 'A ')
  44. {
  45. J = szBuffer [I]-'A ';
  46. }
  47. Else if (szBuffer [I] <= 'Z' & szBuffer [I]> = 'A ')
  48. {
  49. J = szBuffer [I]-'A' + 26;
  50. }
  51. Else
  52. Continue;
  53. NArray [j] ++;
  54. }
  55. }
  56.  
  57.  
  58. // Assign the value to the frequencies Array
  59. For (I = 0, j = 'a'; I <52; I ++, j ++)
  60. {
  61. If (nArray [I]> 0)
  62. {
  63. /*****/
  64. Frequencies [k]. c = j;
  65. Frequencies [k]. freq = nArray [I];
  66. Frequencies [k]. left = NULL;
  67. Frequencies [k]. right = NULL;
  68. K ++;
  69. Printf (% c: % d \ n, j, nArray [I]);
  70. }
  71. If (j = 'Z ')
  72. J = 'a'-1;
  73. }
  74. }
  75. }
  76.  
  77.  
  78. // Create a user tree
  79. Void huffMan (frequencies tree []) {
  80. Int I, j, p1, p2; // p1, p2 respectively remember the subscript of the two root nodes with the minimum weight and the minimum weight
  81. Float small1, small2, f;
  82. Char c;
  83. For (I = 0; I
  84. {
  85. Tree [I]. parent = 0;
  86. Tree [I]. lchild =-1;
  87. Tree [I]. rchild =-1;
  88. Tree [I]. weight = 0.0;
  89. }
  90. Printf ([read characters and weights of the first % d nodes in sequence (separated by spaces)] \ n, n );
  91.  
  92.  
  93. // Read the characters and weights of the First n nodes
  94. For (I = 0; I
  95. {
  96. Printf (enter the "% d" character and the weight, I + 1 );
  97. Scanf (% c % f, & c, & f );
  98. Getchar ();
  99. Tree [I]. ch = c;
  100. Tree [I]. weight = f;
  101. }
  102. // Merge n-1 times to genern-1-1 new nodes
  103. For (I = n; I
  104. {
  105. P1 = 0; p2 = 0;
  106. // Maxval is the maximum value of the float type
  107. Small1 = maxval; small2 = maxval;
  108. // Select the root node with the smallest weight
  109. For (j = 0; j
  110. {
  111. If (tree [j]. parent = 0)
  112. If (tree [j]. weight
  113. {
  114. Small2 = small1; // change the minimum permission, sub-privilege, and corresponding location
  115. Small1 = tree [j]. weight;
  116. P2 = p1;
  117. P1 = j;
  118. }
  119. Else if (tree [j]. weight
  120. {
  121. Small2 = tree [j]. weight; // change the sub-permission and Position
  122. P2 = j;
  123. }
  124. Tree [p1]. parent = I;
  125. Tree [p2]. parent = I;
  126. Tree [I]. lchild = p1; // the smallest root node is the left child of the new node.
  127. Tree [I]. rchild = p2; // The Sub-Permission root node is the right child of the new node
  128. Tree [I]. weight = tree [p1]. weight + tree [p2]. weight;
  129. }
  130. }
  131. }
  132.  
  133.  
  134. // Find the Harman Encoding Based on the Harman tree. The code [] is the Harman encoding, and the tree [] is the known Harman tree.
  135. Void huffmancode (codetype code [], frequencies tree [])
  136. {
  137. Int I, c, p;
  138. Codetype cd; // buffer variable
  139. For (I = 0; I
  140. {
  141. Cd. start = n;
  142. Cd. ch = tree [I]. ch;
  143. C = I; // backtracing from the leaf node
  144. P = tree [I]. parent; // tree [p] is the parent of tree [I]
  145. While (p! = 0)
  146. {
  147. Cd. start --;
  148. If (tree [p]. lchild = c)
  149. Cd. bits [cd. start] = '0'; // tree [I] is the left subtree, generating code '0'
  150. Else
  151. Cd. bits [cd. start] = '1'; // tree [I] is the right subtree, generating code '1'
  152. C = p;
  153. P = tree [p]. parent;
  154. }
  155. Code [I] = cd; // encode the I + 1 character and save it to code [I]
  156. }
  157. }
  158.  
  159.  
  160.  
  161.  
  162. // Decodes data based on the Harman tree
  163. Void decode (hufmtree tree [])
  164. {
  165. Int I, j = 0;
  166. Char B [maxsize];
  167. Char endflag = '2'; // 2 indicates the end of the message.
  168. I = s-1; // search from the root node
  169. Printf (enter the sent encoding (ending with '2 ):);
  170. Gets (B );
  171. Printf (encoded characters );
  172. While (B [j]! = '2 ')
  173. {
  174. If (B [j] = '0 ')
  175. I = tree [I]. lchild; // move to the left subnode
  176. Else
  177. I = tree [I]. rchild; // go to the right subnode
  178. If (tree [I]. lchild =-1) // tree [I] is a leaf node
  179. {
  180. Printf (% c, tree [I]. ch );
  181. I = s-1; // return to the root node
  182. }
  183. J ++;
  184. }
  185. Printf (\);
  186. If (tree [I]. lchild! =-1 & B [j]! = '2') // read the text, but it has not reached the leaf node
  187. Printf (\ ERROR \ n); // The input text is incorrect.
  188. }
  189.  
  190.  
  191.  
  192.  
  193. Void main ()
  194. {
  195. Printf (----------------- practice of the Harman encoding -- \ n );
  196. Printf (a total of % d characters \ n, n );
  197. Frequencies tree [m];
  198. Codetype code [n];
  199. Int I, j; // cyclic variable
  200. HuffMan (tree); // create a user-defined tree
  201. Huffmancode (code, tree); // find the Harman code based on the Harman tree
  202. Printf ([output the Heman encoding for each character] \ n );
  203. For (I = 0; I
  204. {
  205. Printf (% c:, code [I]. ch );
  206. For (j = code [I]. start; j
  207. Printf (% c, code [I]. bits [j]);
  208. Printf (\);
  209. }
  210. Printf ([read and encode the content] \ n );
  211. // Start Encoding
  212. Decode (tree );
  213. }
     

     

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.