Solving the problem of the weighted path length of Huffman coding calculation

Source: Internet
Author: User
Tags uppercase letter

This is a programming hint encountered, learning a blogger's code, some of which are not understood, the analysis is resolved.
First what is Huffman tree:
Huffman tree, also known as the optimal binary tree, is a kind of tree with the shortest length of the weighted path.
That is, the length of the root node to the node is the smallest, of course, the condition is that each path is the right weight,
The so-called path length of the tree is the length of all the leaf nodes in the tree (the Joghen node is 0 layers, the path length of the leaf node to the root node is the layer of the leaf node). The tree's weighted path length is recorded as wpl= (W1*L1+W2*L2+W3*L3+...+WN*LN)

At this time wpl=32x1+24x2+2x3+7x3

The general establishment of Huffman tree steps for
1, all left and right subtrees are empty as root nodes.
2, the tree of two root nodes in the forest is selected as the left and right subtree of a new tree, and the weights of the additional root nodes of the new tree are the sum of the weights of the root nodes on the left and right sub-trees. Note that the weight of the Zuozi should be less than the right subtree's weight value.
3, remove the two trees from the forest and add the new tree to the forest.
4, repeat the 2,3 step, until there is only one tree in the forest, this tree is Huffman tree.
The Taiyuan Science and Technology website gives the animated demonstration
Http://www.tyut.edu.cn/kecheng1/site01/suanfayanshi/Huffman.asp

The above mentioned according to the Order of weights, choose the minimum weight of two, this function in the priority queue can be done. So you can use the priority queue when building Huffman tree

Then look at the topic.

Inputthe inputfileWouldcontainAList  of textstrings, one per line. ThetextStrings would consist only ofUppercase alphanumericcharacters  andUnderscores (which is usedinchPlace ofSpaces). TheEnd  of  theInput would be signalled byA line containing only the Word"END" as  the text string. This line should notBe processed. Outputfor eachtext string inch  theInput, output the length inchBits of  the 8-bit ASCII Encoding, the length inchBits ofAn optimal prefix-free variable-lengthEncoding and  theCompression ratio Accurate toOne decimal point. Sample Inputaaaaabcdthe_cat_in_the_hatendsample Output -  - 4.9144 Wuyi 2.8

This gives you an online reference code, and then analyzes

#include <stdio.h>#include <string.h>#include <ctype.h>#include <functional>#include <queue>using namespace STD;#define M 1000050CharSTR[M];//global variable is initialized to 0 by default;int List[ -]; priority_queue<int, vector<int>,greater<int> >que;intMain () {intAns,sum;intI,a,b,c; while(scanf('%s ', str),strcmp(STR,"END")){memset(List,0,sizeof(List)); for(i=0; str[i];i++) {if(Isalpha(Str[i]))List[str[i]-' A ']++;Else                  List[ -]++; } sum=i*8; ans=i;c=0;//sum bit-bit ans for HFM encoding required for original equal length encoding         for(i=0;i< -; i++) {if(List[i]) {Que.push (List[i]);              C + +; }          }if(c>1)//c==1, there is only one letter{ans=0;//Note that only one character is in the case             while(Que.size ()! =1) {a=que.top ();                  Que.pop ();                  B=que.top ();                  Que.pop ();                  Ans+=a+b;              Que.push (A+B); } while(!que.empty ())//Empty queue after useQue.pop (); }printf("%d%d%.1f\n", Sum,ans,1.0*sum/ans); }return 0; }

1. Input string part

for(i=0;str[i];i++){     if(isalpha(str[i]))      list[str[i]-‘A‘]++;     else      list[26]++;    }  

In Ctype.h, is a macro that is interpreted as an uppercase letter
List is an array of int list[27] Statistics 26 letters and underscore characters, used to count the number of a, B, the absolute distance of the letter to a as the subscript of the array, the corresponding element of the array holds the letter appears
The number of times. The wording here is very concise, the array element + + is written,
2. Code count
sum=i*8;ans=i;c=0;
Sum is the bit bit for the original equal length code ans is HFM encoded, I is the number of letters
3, using priority queue to consider the weight problem

for(i=0;i<27;i++){     if(list[i]){      que.push(list[i]);      c++;     }    }

The number of letters appearing as weights, as in the queue, C is used to record the number of different letters appear.
3, simulation of the establishment of Huffman tree

if  (C>1 ) //c==1, there is only one letter  {ans=0< /span>;//note only one character case  while  (que.size ()!=<      Span class= "Hljs-number" >1 ) {a  =que.top ();      Que.pop ();      B=que.top ();      Que.pop ();      Ans+=a  +b;     Que.push (a  +b); } while  (!que.    Empty  ()  que.pop (); }  

The process in the while is exactly as follows, the steps mentioned above to
If you add input AAAABBBCCD, you will get such a tree according to the above steps

This is coded out to
a:0 1bit
B:10 2bit
c:110 3bit
d:111 3bit
So the number of digits in the encoding is the number of occurrences x encoded bit
1x4+2x3+3x2+3x1=19
This is the length of the weighted path, because the number of occurrences is the weight, the length of the code is the number of nodes to the root of the layer,
How to take the weight path length without building up the tree, as long as the weights are all added up, as is done in the program
In the program
Ans= (1+2) + (3+3) + (4+6);
So break it down.
(1+2) + ((1+2) +3) + (4+ ((1+2) +3))
Add (1+2) 3 times, which is actually the number of layers added.
So ans is this Huffman tree with the right path and.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Solving the problem of the weighted path length of Huffman coding calculation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.