Today see a Huffman coding topic, given a string Abcdabaa, ask Huffman encoded binary string of the total length is how much, the answer is 14
For Huffman tree I do not understand ah, so a meal to find, summed up the following knowledge points, share with you: Of course, part of the content of reference to the next Baidu
Huffman tree, also known as the optimal binary tree, is a two-fork tree with the shortest weighted path. Huffman Tree is an application of two-fork tree, which is commonly used in information retrieval.
Some of the relevant concepts:
1. Path length between nodes: The number of branches from one node to another is called a path length between two nodes.
2. The path length of the tree: the sum of the path length from the root node to each node in the tree.
3. The length of the weighted path of the node: the product of the length of the path from the node to the root node and the right of the node.
4. The length of the tree's weighted path: the sum of the length of the weighted path of all the leaf nodes in the tree.
The minimum two-fork tree with weighted path is called Huffman tree or optimal binary tree.
For Huffman Tree, there is a very important theorem: for the Huffman tree with n leaf nodes, a total of 2*n-1 nodes are needed.
The interpretation of this theorem is as follows: For binary trees, there are three types of nodes, that is, the degree (calculated only) is 2 nodes, the degree of 1 node and the degree of 0 leaf node. The non-leaf node of Huffman tree is generated by two nodes, so the number of non-leaf nodes is reduced by the number of nodes which are 1, and this theorem is validated.
The algorithm for constructing Huffman tree is as follows:
1) for a given n weights {w1,w2,w3,..., Wi,..., Wn} constitute the initial set of n binary tree f={t1,t2,t3,..., ti,..., Tn}, where each binary tree ti has only one weight of the root node of Wi, its left and right subtree are empty.
2) in F, select the tree of two root node weights as the left and right subtree of the newly constructed two-fork tree, and the weight of the root node of the new binary tree is the sum of the weights of the root node of its left and right sub-tree.
3) Remove the two trees from F and add the new two-fork tree in ascending order to the set F.
4) Repeat 2) and 3) until there is only one binary tree in the set F.
The path length of the Huffman tree can be calculated wpl= (1+3) *3+2*5+1*7=26.
Huffman Code:
According to Huffman tree can solve the problem of message coding. Suppose a string "Abcdabcaba" is required to encode it into a unique binary code, but the length of the binary encoding required to convert it is minimal.
Assuming that each character appears in the string in the frequency of W, its encoded length is l, encoded characters N, then the total length of the encoded binary code is W1L1+W2L2+...+WNLN, which is exactly the Huffman tree processing principle. Therefore, the structure principle of Huffman tree can be used for binary coding, which makes the message length shortest.
For "Abcdabcaba", there are A, B, C, D4 characters, the number of occurrences are 4, 3, 2, 1, the equivalent of their weights, A, B, C, D to the occurrence of the number of times as the weight of the construction Huffman tree, to get the results of the left-hand image.
From the Haffman node, assign code "0" to the left dial hand tree, assign "1" to the right subtree, and reach the leaf node. Then, from the root of the tree along each path to the leaf node of the code arranged to get each leaf node Huffman code, as shown in the right.
As can be seen, A, B, C, D corresponds to the encoding of 0, 10, 110, 111, and then the string "Abcdabcaba" into the corresponding binary code is 0101101110101100100, the length is only 19. This is the shortest binary encoding, also known as Huffman coding.
According to the above-mentioned law is not difficult to find: Huffman coding has a rule: suppose there are n leaf nodes need to encode, the final Huffman tree must have n layer, Huffman code to get the maximum length of the binary code is N-1.
The problem of Huffman Tree and Huffman coding