The Huffman tree is also known as the optimal full binary tree. Before getting started, let's look at several definitions.
1. Path
To put it simply, a path is the branch passing through a specified node to another specified node, for example, in the Red Branch (A-> C-> B and C-> D-> E-> F)
Figure 1
2. Path Length)
That is, the number of branches in the path. For example, the path length in (a) is 2, and the path length in (B) is 3.
3. Weight of Node)
In some specific applications, it is sometimes necessary to deliberately distinguish the importance (or priority) between nodes. For example, if a node is more important than B node ), you can add an int-type attribute value weight to these nodes to indicate this importance. This is the weight of the node.
Figure 2
4. Weight path length of node ):
The path length from the node to the root node of the tree * the weight of the node.
Medium
The length of the weighted path of Node 1 is 1*2 = 2;
The length of the weighted path of Node 2 is 2*2 = 4;
The length of the weighted path of Node 3 is 3*2 = 6;
The length of the weighted path of node 4 is 4*2 = 8;
5. Tree weighted path length
Each node in the tree calculates its own length of the weighted path according to the definition in 4, and then adds the result together, that is, the length of the weighted path of the entire tree.
In Figure 2, the length of the tree's weighted path is 2 + 4 + 6 + 8 = 20
If the weights of four nodes are 1, 2, 3, and 4, there are many methods to construct a Complete Binary Tree, for example:
It is shown that the total length of the weighted path of the (c) tree is the minimum (19), while that of the other trees is 20, OK, which is the legendary Harman tree, it can be understood as follows:
Given a group of leaf nodes with weights, they are used to construct a Complete Binary Tree. The final length of the entire tree with the weighted path (total) is the shommann tree. (Of course, this is the definition of the Folk shanzhai according to my understanding. The official definition of this book is "data structure and algorithm". There are a bunch of mathematical symbols on it, it may seem dizzy for students who do not like numbers)
The construction algorithm of the cowman tree:
1. In a given leaf node with weight, find the two with the minimum weight (for convenience, You can first sort the leaf nodes by weight from small to large first, in this way, you only need to take the first two items), and then add a temporary node as the parent node of the two nodes (its weight is the combination of the weights of the two leaf nodes)
2. Remove the two leaf nodes that have just been processed, and then put the newly added temporary nodes together with the remaining leaf nodes for the same processing, namely: from the collection of the new node and the leaf node, find the second with the smallest weight, and then add the new node for processing in step 1st.
3. Repeat the above process until each leaf node completes processing.
If we have a group of leaf nodes with a weight of 1, 2, 4, and 3, the process diagram is as follows:
C # Algorithm Implementation:
First, review the two important knowledge points mentioned in the previous article:
1. Mathematical features of Binary Trees:
For a non-empty Binary Tree, if the number of nodes with a degree of 0 is X, and the number of nodes with a degree of 2 is Y, there is X = Y + 1 (that is, y = x-1)
That is to say, the total number of all nodes is x + y = x + (x-1) = 2 * X-1
2. Complete Binary Tree to facilitate sequential storage (that is, using an array or list of linear structures <t> for storage)
Node. CS:
Using system; using system. collections. generic; using system. LINQ; using system. text; namespace Harman tree {public class node {private int weight; // Weight Value: Private int lchild; // the serial number of the Left subnode: Private int rchild; // The serial number of the right subnode private int index; // the serial number of the current node public int weight {get {return weight;} set {Weight = value ;}} public int lchild {get {return this. lchild;} set {lchild = value;} public int rchild {get {return this. rchild;} set {rchild = value;} public int index {get {return this. index ;}set {Index = value ;}} public node () {Weight = 0; lchild =-1; rchild =-1; Index =-1 ;} public node (int w, int LC, int RC, int p) {Weight = W; lchild = Lc; rchild = RC; Index = P ;}}}
Huffmantree. CS (Note: The create algorithm of the following code may not be the most efficient, but it is easy to understand)
Using system; using system. collections. generic; using system. LINQ; using system. text; namespace Harman tree {public class huffmantree {private list <node> _ TMP; private list <node> _ nodes; Public huffmantree (Params int [] weights) {If (weights. length <2) {Throw new exception ("the number of leaf nodes cannot be less than 2! ");} Int n = weights. length; array. sort (weights); // you can create leaf nodes and sort them by weight in ascending order. List <node> lstleafs = new list <node> (N ); for (INT I = 0; I <n; I ++) {var node = new node (); node. weight = weights [I]; node. index = I; lstleafs. add (node);} // create a temporary node container _ TMP = new list <node> (2 * n-1 ); // The container _ nodes = new list <node> (_ TMP. capacity); _ TMP. addrange (lstleafs); _ nodes. addrange (_ TMP);} // <summary> // construct the Huffman tree // </Summary> Public void create () {While (this. _ TMP. count> 1) {var TMP = new node (this. _ TMP [0]. weight + this. _ TMP [1]. weight, _ TMP [0]. index, _ TMP [1]. index, this. _ TMP. max (C => C. index) + 1); this. _ TMP. add (TMP); this. _ nodes. add (TMP); // Delete two processed nodes this. _ TMP. removeat (0); this. _ TMP. removeat (0); // sort weights from small to large. This. _ TMP = This. _ TMP. orderby (C => C. weight ). tolist () ;}/// <summary> // test and output the key values of each node (for debugging) /// </Summary> /// <returns> </returns> Public override string tostring () {stringbuilder sb = new stringbuilder (); For (INT I = 0; I <_ nodes. count; I ++) {var n = _ nodes [I]; sb. appendline ("index:" + I + ", weight:" + N. weight. tostring (). padleft (2, '') +", lchild_index: "+ N. lchild. tostring (). padleft (2, '') +", rchild_index: "+ N. rchild. tostring (). padleft (2, '');} return sb. tostring ();}}}
Test:
Using system; namespace user tree {class program {static void main (string [] ARGs) {huffmantree tree = new huffmantree (,); tree. create (); console. writeline ("the node value of the final tree is as follows:"); console. writeline (tree. tostring (); console. readline ();}}}
The output result is as follows:
The node value of the final tree is as follows:
Index: 0, weight: 1, lchild_index:-1, rchild_index:-1
Index: 1, weight: 2, lchild_index:-1, rchild_index:-1
Index: 2, weight: 3, lchild_index:-1, rchild_index:-1
Index: 3, weight: 4, lchild_index:-1, rchild_index:-1
Index: 4, weight: 3, lchild_index: 0, rchild_index: 1
Index: 5, weight: 6, lchild_index: 2, rchild_index: 4
Index: 6, weight: 10, lchild_index: 3, rchild_index: 5
The output result may not be intuitive. You can see the following figure.
Huffman Encoding)
First, let's talk about seemingly unrelated topics. In telegraph transmission, we usually need to encode the transmitted content (because the telegraph only uses 0, 1 to represent the sent content, so it is necessary to eventually convert the characters such as ABCDE into a combination of 0 and 1, which involves how to combine the character set [A-Z] and [0, 1] one-to-one correspondence problem)
Assume that there is message content: aaaabbbccd needs to be encoded before being transferred. Now we need a coding scheme.
First, it is easy to think of the following fixed-length encoding scheme. Each character is represented by two digits, for example:
A-> 00
B-> 01
C-> 10
D-> 11
Then, the final encoding of aaaabbbccd is, (note: here we add a comma to make it more intuitive, and it is not needed in actual coding)
However, the telegraph experts proposed another shorter indefinite encoding scheme:
A-> 0
B-> 10
C-> 111
D-> 110
According to this encoding scheme, the final encoding of aaaabbbccd is: 0, 0, 0, 10, 10,111,111,110
Compare the codes of the two solutions:
, 00, 01, 10, 11 (not counted as a comma, a total of 20 digits)
0, 0, 0, 10, 10, 10,111,111,110 (19 digits without commas)
Brick House is indeed a brick house!
After careful analysis, we will find that this "uncertain length" encoding scheme requires an important premise: No encoding can be the prefix of any other encoding! Otherwise, there will be ambiguity during decoding.
For example, if C encoding is 10, D encoding is 101, A encoding is 1, and B encoding is 01
Now I receive a 10101 message. Is it decoded as CCA or DB?
Now we will reveal the secrets of the Harman code:
In the aaaabbbccd example just now, the electric text only contains the characters A, B, C, and D. If you think of them as leaf nodes, and consider the weight (the number of occurrences of D is the minimum, the weight is the lowest; the number of occurrences of C is higher than that of D, so the weight is higher than D, and so on). In this way, we have a set of leaf nodes with weight (a-weight 4, b-weight 3, C-weight 2, D-weight 1), and use them to construct a Harman tree:
At the same time, we make a Convention to have a branch: the branch to the left corresponds to the number 0, and the branch to the right corresponds to the number 1, in this way, you can obtain a string of numbers from the root node to the path of each leaf node.
That is: A-> 0, B-> 10, c-> 110, D-> 111, this is an encoding!
In addition, it should be noted that for a binary tree, a certain leaf node may only be on a unique Branch (that is, it is impossible for a leaf node to be on this branch and on other branches ), this ensures that the encoding obtained by each leaf node cannot be the prefix of other codes.
OK. The problem of searching for the Harman encoding is eventually converted into the construction problem of the Harman tree, and the problem has been solved. (I learned how to use the Harman code. Maybe we can play an alternative confession game with some ice and snow smart mm, send a string of numbers, and then configure a picture to show that she doesn't understand your mind, if the meaning behind her successful solution is Iloveyou, and then a series of auspicious numbers are sent back to you, then... congratulations !)