This is a programming hint encountered, learning a blogger's code, some of which are not understood, the analysis is resolved.
First what is Huffman tree:
Huffman tree, also known as the optimal binary tree, is a kind of tree with the shortest length of the weighted path.
That is, the length of the root node to the node is the smallest, of course, the condition is that each path is the right weight,
The so-called path length of the tree is the length of all the leaf nodes in the tree (the Joghen node is 0 layers, the path length of the leaf node to the root node is the layer of the leaf node). The tree's weighted path length is recorded as wpl= (W1*L1+W2*L2+W3*L3+...+WN*LN)
At this time wpl=32x1+24x2+2x3+7x3
The general establishment of Huffman tree steps for
1, all left and right subtrees are empty as root nodes.
2, the tree of two root nodes in the forest is selected as the left and right subtree of a new tree, and the weights of the additional root nodes of the new tree are the sum of the weights of the root nodes on the left and right sub-trees. Note that the weight of the Zuozi should be less than the right subtree's weight value.
3, remove the two trees from the forest and add the new tree to the forest.
4, repeat the 2,3 step, until there is only one tree in the forest, this tree is Huffman tree.
The Taiyuan Science and Technology website gives the animated demonstration
Http://www.tyut.edu.cn/kecheng1/site01/suanfayanshi/Huffman.asp
The above mentioned according to the Order of weights, choose the minimum weight of two, this function in the priority queue can be done. So you can use the priority queue when building Huffman tree
Then look at the topic.
Inputthe inputfileWouldcontainAList of textstrings, one per line. ThetextStrings would consist only ofUppercase alphanumericcharacters andUnderscores (which is usedinchPlace ofSpaces). TheEnd of theInput would be signalled byA line containing only the Word"END" as the text string. This line should notBe processed. Outputfor eachtext string inch theInput, output the length inchBits of the 8-bit ASCII Encoding, the length inchBits ofAn optimal prefix-free variable-lengthEncoding and theCompression ratio Accurate toOne decimal point. Sample Inputaaaaabcdthe_cat_in_the_hatendsample Output - - 4.9144 Wuyi 2.8
This gives you an online reference code, and then analyzes
#include <stdio.h>#include <string.h>#include <ctype.h>#include <functional>#include <queue>using namespace STD;#define M 1000050CharSTR[M];//global variable is initialized to 0 by default;int List[ -]; priority_queue<int, vector<int>,greater<int> >que;intMain () {intAns,sum;intI,a,b,c; while(scanf('%s ', str),strcmp(STR,"END")){memset(List,0,sizeof(List)); for(i=0; str[i];i++) {if(Isalpha(Str[i]))List[str[i]-' A ']++;Else List[ -]++; } sum=i*8; ans=i;c=0;//sum bit-bit ans for HFM encoding required for original equal length encoding for(i=0;i< -; i++) {if(List[i]) {Que.push (List[i]); C + +; } }if(c>1)//c==1, there is only one letter{ans=0;//Note that only one character is in the case while(Que.size ()! =1) {a=que.top (); Que.pop (); B=que.top (); Que.pop (); Ans+=a+b; Que.push (A+B); } while(!que.empty ())//Empty queue after useQue.pop (); }printf("%d%d%.1f\n", Sum,ans,1.0*sum/ans); }return 0; }
1. Input string part
for(i=0;str[i];i++){ if(isalpha(str[i])) list[str[i]-‘A‘]++; else list[26]++; }
In Ctype.h, is a macro that is interpreted as an uppercase letter
List is an array of int list[27] Statistics 26 letters and underscore characters, used to count the number of a, B, the absolute distance of the letter to a as the subscript of the array, the corresponding element of the array holds the letter appears
The number of times. The wording here is very concise, the array element + + is written,
2. Code count
sum=i*8;ans=i;c=0;
Sum is the bit bit for the original equal length code ans is HFM encoded, I is the number of letters
3, using priority queue to consider the weight problem
for(i=0;i<27;i++){ if(list[i]){ que.push(list[i]); c++; } }
The number of letters appearing as weights, as in the queue, C is used to record the number of different letters appear.
3, simulation of the establishment of Huffman tree
if (C>1 ) //c==1, there is only one letter {ans=0< /span>;//note only one character case while (que.size ()!=< Span class= "Hljs-number" >1 ) {a =que.top (); Que.pop (); B=que.top (); Que.pop (); Ans+=a +b; Que.push (a +b); } while (!que. Empty () que.pop (); }
The process in the while is exactly as follows, the steps mentioned above to
If you add input AAAABBBCCD, you will get such a tree according to the above steps
This is coded out to
a:0 1bit
B:10 2bit
c:110 3bit
d:111 3bit
So the number of digits in the encoding is the number of occurrences x encoded bit
1x4+2x3+3x2+3x1=19
This is the length of the weighted path, because the number of occurrences is the weight, the length of the code is the number of nodes to the root of the layer,
How to take the weight path length without building up the tree, as long as the weights are all added up, as is done in the program
In the program
Ans= (1+2) + (3+3) + (4+6);
So break it down.
(1+2) + ((1+2) +3) + (4+ ((1+2) +3))
Add (1+2) 3 times, which is actually the number of layers added.
So ans is this Huffman tree with the right path and.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Solving the problem of the weighted path length of Huffman coding calculation