Reprinted please indicate the source: http://blog.csdn.net/ns_code/article/details/19174553
About Huffman Tree
The Huffman Tree, also known as the optimal binary Tree, is a type of Tree with the shortest length of the weight path. Assume there are n weights {w1, w2 ,..., wn}. If we construct a binary tree with n leaf nodes, the weights of these n leaf nodes are {w1, w2 ,..., wn}, the constructed binary tree with the minimum length of the weighted path is called the Heman tree.
Here we will add the concept of weighted path length in the lower tree. The length of a tree's weighted path refers to the sum of the path lengths from all leaf nodes to the root node and the product of the weights of the leaf nodes. If there are n leaf nodes in a binary tree, wi indicates the weight of the leaf node I, and Li indicates the path length from the leaf node I to the root node, then the length of the weighted path WPL = W1 * L1 + W2 * L2 +... wn * Ln.
The shape of the Heman tree varies according to the number of nodes and weights. The Heman tree has the following features:
For the same group of weights, the Heman tree is not necessarily unique. The left and right subtree of the Heman tree can be exchanged, because this does not affect the length of the tree's weighted path. All nodes with weights are leaf nodes, and those without weights are the root nodes of a subtree. A node with a higher weight is closer to the root node of the Heman tree, And a node with a smaller weight is farther away from the root node of the Heman tree. There are only leaf nodes and nodes with a degree of 2 in the Heman tree, and there is no node with a degree of 1. A Heman tree with n leaf nodes has 2-1 nodes.
The construction steps of the Huffman Tree are as follows: 1. Regard the given n weights as the n Binary Trees with only root nodes (no left or right children) and form a set of HT, the weight of each tree is the weight of the node. 2. Select two binary trees with the smallest weights from the set HT to form a new binary tree. The weights are the sum of the weights of the Two Binary Trees. 3. Delete the two binary trees selected in step 2 from the set HT and add the New Binary Trees in step 2 to the set HT. 4. Repeat steps 2 and 3 until the set HT contains only one tree, which is the Heman tree.
Assume that the following five values are given:
Then, according to the above steps, we can construct the Heman tree shown in the left figure below. Of course, we may also construct the Heman tree shown in the right figure below, which is not unique.
The Application of Huffman encoding in the Heman tree is very extensive, such as the well-known application in communication messages. When sending an electronic message, we want the total length of the message to be as short as possible, so we can design a code of varying length for each character, so that more characters in the text can be encoded as short as possible. To avoid ambiguity during decoding, we can adopt the following encoding method:
That is, if the left branch is encoded as 0 and the right branch is encoded as 1, a string consisting of the branch characters from the root node to the leaf node path is used as the character encoding of the leaf node, this is the Heman encoding. Based on the above picture on the left, we can get the Heman encoding of each leaf node as follows:
If the weight is 5, the Heman code of the node is: 11. If the weight is 4, The Heman code of the node is: if the 10-weight value is 3, The Heman code of the Self-node is: if the 00-weight value is 2, The Heman code of the Self-node is: if the 011 weight is 1, The Heman code of the node is: 010.
For the above picture, we can get the Heman encoding of each leaf node as follows: if the 00 weight is 4, The Heman code of the Self-node is: 01 if the weight is 3, The Heman code of the Self-node is: if the value of 10 is 2, The Heman encoding of his node is: 110 if the weight is 1, The Heman encoding of his node is: 111 the C Implementation of Huffman encoding because there is no node with a degree of 1 in the Heman tree, a Heman tree with n leaf nodes has a total of 2n-1 nodes (the last one ), therefore, these nodes can be stored in a 2n-1 array. We can use the following data structure to represent the Heman tree and Heman encoding:
/* The storage structure of the Heman tree, which is also a binary tree structure. This storage structure is suitable for both tree and forest. */Typedef struct Node {int weight; // weight int parent; // the serial number of the parent Node. 0 indicates the root Node int lchild and rchild; // The serial number of the left and right child nodes. 0 indicates the leaf node} HTNode and * HuffmanTree. // It is used to store typedef char ** HuffmanCode of all nodes in the Heman tree; // He/He encoding used to store each leaf node
Based on the construction steps of the Heman tree, we can write the code for building the Heman tree as follows:
/* Construct a Heman tree based on the given n weights, and store n weights in wet */HuffmanTree create_HuffmanTree (int * wet, int n) {// A Heman tree with n leaf nodes has a total of 2n-1 node int total = 2 * n-1; HuffmanTree HT = (HuffmanTree) malloc (total * sizeof (HTNode )); if (! HT) {printf ("HuffmanTree malloc faild! "); Exit (-1);} int I; // HT [0], HT [1]... store n leaf nodes to be encoded in HT [n-1] for (I = 0; I
The select_minium () function is called in the above Code. It indicates that two smallest Binary Trees are selected from the set. The Code is as follows:
/* Select two of the first k elements of the HT array with the smallest weight and zero parent, and save their sequence numbers in min1 and min2 */void select_minium (HuffmanTree HT, int k, int & min1, int & min2) {min1 = min (HT, k); min2 = min (HT, k );}
The min () function code called here is as follows:
/* Select the element with the smallest weight and zero parent from the first k elements of the HT array, and return the sequence number of the element */int min (HuffmanTree HT, int k) {int I = 0; int min; // The serial number int min_weight that stores the element with the smallest weight value and zero parent value; // store the weight value of the element with the smallest weight and zero parent. // first, assign the weight value of the first element with zero parent to min_weight for later use. // Note that we cannot follow the general practice here. We should directly import HT [0]. weight is assigned to min_weight, // because if HT [0]. the weight value is relatively small, so the first time the binary tree is constructed, it will be selected, // and the comparison of the next round of selection of the minimum weight value to construct the binary tree is still using HT [0]. weight Value to determine, // this way, it will be selected again, resulting in a logical error. While (HT [I]. parent! = 0) I ++; min_weight = HT [I]. weight; min = I; // selects the element with the smallest weight and zero parent, and assigns its serial number to minfor (; I
After building the Heman tree, you can perform the Heman encoding. If you require the Heman encoding, You need to traverse the path from the root node to the leaf node, here we use reverse traversal from the leaf node to the root node to find the Heman encoding of each character. The Code is as follows:
/* Reverse extract the Heman encoding of n leaf nodes from the leaf node to the root node and save it in HC */void HuffmanCoding (HuffmanTree HT, HuffmanCode & HC, int n) {// used to save the pointer HC = (HuffmanCode) malloc (n * sizeof (char *) pointing to each Heman encoding string; if (! HC) {printf ("HuffmanCode malloc faild! "); Exit (-1);} // temporary space, used to save the Heman encoded string char * code = (char *) obtained each time *) malloc (n * sizeof (char); if (! Code) {printf ("code malloc faild! "); Exit (-1);} code [n-1] = '\ 0'; // encoding Terminator, it is also the ending sign of the character array // find the Heman-encoded int I for each character; for (I = 0; I
The five weights of 5, 4, 3, 2, and 1 are given as an example. The result is as follows: http://download.csdn.net/detail/mmc_maodun/6923741.