Huffman Tree (Theory article)

Source: Internet
Author: User
Tags character set in degrees


The concepts related to Huffman tree:

path: The branch between one node in the tree and another node makes up the path between the two nodes.

path Length: The number of branches on the path is called the path length.

the path length of the tree: the sum of the length of the path from the root to each node.

The length of a node's weighted path: In a tree, if there is a weight attached to its node, the path length of the node and the weight on that node are usually

The product of this node is called the path length of the belt (weighted path).

tree-weighted path length: If there is a weight on each leaf in the tree, the sum of the path length of all the leaves in the tree is called the band of the tree.

Weight Path length



What is a weighted value?


The weight value is the value above the defined path. This can be understood as the distance between nodes. Usually refers to the probability that a character corresponds to the binary encoding that appears.

As for the weights in the Hoffmann tree, it can be understood that the weight value indicates the probability of occurrence is large.

The weighted value of a node is actually the percentage of the tree that the knot is in.

ABCD Four leaf nodes have a weight of 7,5,2,4. This 7,5,2,4 is based on the actual situation, for example, from a text to count the ABCD four letters appear in the number of 7,5,2,4. The value of a node is 7, which means that the a node occupies 7 of the weight in the system. It can actually be expressed as a percentage, but the trouble is actually the same.

To set a binary tree with n weighted leaf nodes, the length of the weighted path of the two-fork tree is:

  

in the formula, WK is the weighted value of the k leaf node; LK is the path length of the node.

Example:

====================================================================================================== in general, Using N (n>0) a weighted leaf to construct a two-fork tree, which limits the number of nodes in a binary tree that can only appear in degrees 2, except for the n leaves. so the two-fork tree that conforms to such conditions can often construct many stars, of which the two-tree with the minimum length of the weighted path is called Huffman tree or optimal binary tree.


II. The structure of Huffman tree
according to the definition of Havermann tree, to make its WPL value minimum, a binary tree must make the leaf node with the higher weight more close to the root node, and the smaller the weight of the leaf node. The farther away from the root node. Havermann based on this feature, a method for constructing the optimal binary tree is proposed, and its basic ideas are as follows: The following shows the process of constructing a Huffman tree using the Huffman algorithm:Unequal length encoding when transmitting a message, in order to keep its number of bits as small as possible, each character's encoding can be designed to be unequal, a relatively short encoding is assigned to the more frequently used character, and a longer encoding is allocated using a lower frequency character. For example, the a,b,c,d four characters can be assigned 0,00,1,01, and the above message can be sent with a binary sequence: 000011010, the length of only 9 bits, but with a problem, the receiving party received this message can not be decoded, Because it is not possible to determine that the preceding 4 0 is 4 A, 1 B, 2 A, or 2 B, that is, the decoding is not unique, so this encoding method is not available. therefore, in order to design long and unequal encodings in order to reduce the overall length of the message, it is also important to consider the uniqueness of the encoding, that is, the encoding of any one character must not be prefixed with another character when establishing an unequal length encoding called prefix encoding (prefix code)(1) construct a Huffman tree by using the frequency of each character in the character set as the weight value; (2) starting from the root node, the left branch of each leaf node path is given 0, the right branch is given 1, and the leaf node is encoded from the root to the leaf direction . Example:Suppose a text file Tfile contains only 7 characters {a,b,c,d,e,f,g}, the 7 characters that appear in the text for {5,24,7,17,34,5,13} use Huffman trees to construct a code that conforms to the prefix encoding requirements for the file Tfile: 1. Tfile 7 characters nonalphanumeric as leaf nodes, each character occurrences as the weight of the leaf node 2. Specifies that all the left branches of the Huffman tree represent the character 0, and all right branches represent the character 1, and the sequence of bits from the root node to the branch of each leaf node is encoded as the character encoding 3 for that node. Because from the root node to any one leaf node can not go through other leaves, this encoding must be prefix encoding, Huffman tree with the right path length is exactly the total length of the file Tfile encoded by the HAFFMAN structure of the code calledHavermann Code (Huffman code)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.