13. Heman tree and Its Application
1. definitions and principles of the Heman tree1. Path Length: The branches between one node in the tree and the other constitute the path between two nodes. The number of branches in the path is called the path length;2. Tree path length: The sum of the path lengths from the root to each node;3. weighted path length of a node: The product of the length of the path from the node to the root of the node and the permission on the node;4. The length of the tree's weighted path: The sum of the weighted path lengths of all leaf nodes in the tree;5. Homan tree Definition: Assume there are n weights {w1, w2 ,...., wn}, construct a binary tree with n leaf nodes. Each leaf node has a weight of wk, and the path length of each leaf is lk, the binary tree with the minimum length of WPL is called the Heman tree, which is also called the optimal binary tree. As shown in Figure A, the length of the tree's weighted path WPL = 1*10 + 2*70 + 3*15 + 3*5 = 210.6. Create a Heman tree (algorithm description)(1) Based on the given n weights {w1, w2 ,...., wn} is a set of n Binary Trees F = {T1, T2 ,..., tn}. Each Binary Tree T1 has only one w1 root node, and its left and right Subtrees are empty. (2) in F, select the tree with the minimum weight of the two root nodes as the left and right subtree to construct a new binary tree, the weights of the root nodes of the new binary tree are the sum of the weights of the root nodes of the left and right Subtrees. (3) Delete these two shards in F, at the same time, add the new binary tree to F; (4) Repeat steps 2 and 3 until F contains only one tree, that is, the Heman tree.7. Example of building a Heman treeAssume that there is an ordered sequence {A5, B15, C40, D30, E10}. The binary tree consists of the following: Step 1: first, the leaf nodes with the right value are arranged in an ordered sequence in ascending order, that is, A5, E10, B15, D30, and C40.
Step 2: Take the two nodes with the minimum weights of the first node as the two children of the new node N1. The relatively small node is the left child. The weight of the new node is the sum of the two leaf weights. For example:
Step 3: replace N1 with A and E. The new sequence is N115, B15, D30, and C40.
Step 4: Repeat Step 2 and use N1 and B as the two children of the new node N2. The N2 weight is 15 + 15 = 30. For example:
Step 5: replace N2 with N1 and B. The new sequence is N230, D30, and C40.
Step 6: Repeat Step 2. Take N2 and D as two children of the new node N3. The weight of N3 is 30 + 30 = 60, for example:
Step 7: replace N3 with N2 and D. The new sequence is C40 and N360.
Step 8: Repeat Step 2 and use C and N3 as the two children of the new node T. T is the root node, so far the construction of the Heman tree has been completed. For example:
The WPL of the binary tree in the figure is 40*1 + 30*2 + 15*3 + 10*4 + 5*4 = 205. The binary tree constructed in the preceding steps is the optimal Heman tree.
Ii. Heman Encoding
1. Definition
Assume that the character set to be encoded is {d1, d2 ,... dn}, the number or frequency of occurrences of each character in the text is {w1, w2 ,..., wn}, with d1, d2 ,..., dn is used as the leaf node and w1, w2 ,..., wn is used as the weight of the corresponding leaf node to construct a Heman tree. Rule: The left branch of the Heman tree represents 0, and the right branch represents 1, then the 0 and 1 sequences composed of path branches from the root node to the leaf node are encoded for the corresponding characters of the node.
2. Example: Construct a Heman tree for compression encodingHis main purpose in studying this optimal binary tree is to solve the problem of optimal data transmission for long-distance communication (mainly Telegraph) in the current year. For example, the binary data is used to transmit a string of characters "BADCADFEED", as shown in the following table:
Letter |
A |
B |
C |
D |
E |
F |
Binary Character |
000 |
001 |
010 |
011 |
100 |
101 |
The encoded binary data stream is "001000011010000011101100100011", and the receiving result is decoded by a group of three digits. Assume that these six letters appear at different frequencies, A 27%, B % 8, C 15%, D 15%, E 30%, F 5%. The following uses 27, 8, 15, 15, 30, and 5 as the weights of A, B, C, D, E, and F to construct the Heman tree, for example:
In Figure B, change the weight of the left branch to 0 and the right branch to 1, for example, C:
The six letters are now encoded with 0 or 1 in the path from the root node to the leaf. The resulting encoding table is as follows:
Letter |
A |
B |
C |
D |
E |
F |
Binary Character |
01 |
1001 |
101 |
00 |
11 |
1000 |
"BADCADFEED" is re-encoded to "1001010010101001000111100", with a total of 25 characters, saving about 17% of storage and transmission costs compared with the 30 characters previously encoded.
During decoding, the sender and receiver use the same Heman encoding rules. When the receiver receives "1001010010101001000111100", it compares the Heman tree in Figure C from 1001 to B, for example:
And then 01, then go from the root node to the letter A, for example:
Other letters can be decoded.