Huffman Tree and its application

Source: Internet
Author: User
Tags character set

1, the basic concept of Huffman tree

The----Huffman (Huffman) tree, also known as the optimal binary tree , is a two-tree with the smallest weight path length in all two-fork trees with a weighted leaf node .

----"Path" is the part of the branch from one node in the tree to the other, and the number of branches is the path length .

---- Tree Path length : The sum of the path length from the root to each node.

----Consider the node with the right, the length of the node 's weighted path is the product of the length of the path from the node to the root and the right of the node.

The weighted path length of the---- tree wpl (weightedpath lengths): The sum of the weighted path lengths of all leaf nodes in the tree.

Suppose a two-fork tree with n- weighted leaf nodes has a weight of {w1,w2,.... wn}, each leaf node with the right wk, and the path length of each leaf is LK, from the root node

The sum of the lengths of the paths to each leaf node and the corresponding weights is the sum of the weighted path lengths called the two-fork tree, which is usually written as:

The following figure shows the three different weighted binary trees consisting of 4 leaf nodes:

The weighted path length of the three binary trees is:

(a) wpl=9x2+4x2+5x2+2x2=18+8+10+4=40

(b) wpl=9x1+5x2+4x3+2x3=9+10+12+6=37

(c) wpl=4x1+2x2+5x3+9x3=4+4+15+27=50

where (b) The WPL is the smallest of the two fork trees, this tree is Huffman tree. The above figure shows that the two-fork tree consisting of n-weighted leaf nodes, full of two tree or complete binary tree

is not necessarily the optimal binary tree. The higher the value of the node is, the closer the root node to the two-fork tree is the optimal binary tree.

2, the construction method of Huffman tree

--1) First the leaf nodes of the right value are arranged in order from small to large , that is, d2,b4,c5,a9.

--2) takes the first two minimum weights of nodes as a new node n of two child nodes, preferably a relatively small left child, where D is the left child of N, B is the right child of N.

As shown in Figure 2-1 (a), the new node n has a weight of two leaf node weights and 2+4=6.

--3) replaces the N1 with D and B, inserting an ordered sequence and keeping it from small to large. That is c5,n6,a9.

--4) Repeat step 2) and N and C as the two children nodes of a new node m. As shown in Figure 2-1 (b), the weighted value of M is =5+6=11.

--5) replaces M with C and N, inserted in an ordered sequence, A9,m11.

--6) Repeat step 2), A and m as a new node T two children node, because T is the root node, the completion of Huffman tree structure.

This structure of the two-fork tree is the best Huffman tree, through the steps just now, you can get the Huffman tree structure Huffman algorithm Description:

--1) A set of n binary trees based on the weighted value {w1,w2,... WN} of a given n leaf node f={t1,t2,... Tn}, where only one of each binary tree ti has the right to WI

The root node, the left and right subtree are empty.

--2) The tree with the least weight of two root nodes in F is constructed as a new two-fork tree, and the weight of the root node of the new two-fork tree is the root node of the left and right sub-tree.

The sum of the weights of the values.

3) Delete the two trees in F and add the newly obtained two forks to F.

(4) Repeat step 2) 3) until F contains only one tree. This tree is the Huffman tree.

Huffman tree node type
typedef struct HTREENODE
{
	char data[5];//node is a character type with a maximum of 5 characters
	int weight;//The weight
	of the characters int parent; The parent node is located under the subscript
	int left;//the child node is located under the subscript
	int right;//the child node is located under Subscript
}htnode;
Huffman tree structure, n nodes, the last generated tree has 2n-1 node
void Createhtree (Htnode ht[],int N)
{
	int lchild,rchild;
	int min1,min2;
	for (int i = 0;i < 2*n-1;i++)
	{
		ht[i].parent = Ht[i].left = Ht[i].right =-1;
	}
	for (int j = N;j < 2 * n-1;j++)//The first n nodes are known leaf nodes, the node after constructing n
	{
		min1 = min2 = 32767;
		Lchild = Rchild =-1;
		for (int k = 0;k<j-1;k++)
		{
			if (ht[k].parent = =-1)//find
			{
				if (ht[k].weight<min1) only in nodes that have not yet constructed a two-fork tree )
				{
					min2 = Min1;rchild = Lchild;
					Min1 = ht[k].weight;
					Lchild = k;
				}
				else if (ht[k].weight<min2)
				{
					min2 = ht[k].weight;
					Rchild = k;}}
		}
		Ht[lchild].parent = j;
		Ht[rchild].parent = j;
		Ht[j].weight = Ht[lchild].weight + ht[rchild].weight;
		Ht[j].left = Lchild;
		Ht[j].right = rchild;
	}
}
3. Huffman code

----Huffman coding is an application of Huffman tree. In digital communications, it is often necessary to convert the transferred text into a binary string consisting of binary characters 0 and 1, a process that is

Called Encoding. When transmitting a message, it is always hoped that the message code is as short as possible, and the total length of the message constructed by Huffman coding is shortest.

----by common sense, the probability of each character appearing in a message is different. It is assumed that the probability of a,b,c,d four characters appearing in a message is 4/10,1/10,3/10,

2/10, if the use of unequal length coding, so that the frequency of low characters have a longer encoding, it is possible to shorten the total length of the transmission message.

The two semantics and ambiguity of decoding should be avoided----using unequal length coding. Assuming 0 for C, 01 for D, then when the encoded string 01 is received and decoded to 0 o'clock, it is immediately translated

C, or the next character 11 is translated into the corresponding character D, which produces two semantics. Therefore, if you encode a character set in an unequal length , the character set is required to either

one-character encoding cannot be a prefix for other character encodings , and the encoding that meets this requirement is called prefix encoding .

----in order to make the unequal length encoding is also the prefix encoding, you can use each character in the character set as a leaf node to generate a coded binary tree, in order to obtain the shortest message length, you can

The frequency of each character appearing as the weight of the character is assigned to the corresponding leaf node, and the minimum path length of the tree is the shortest encoding of the message.

----can construct Huffman tree T based on Huffman algorithm. Set the above message character set D={a,b,c,d} to be encoded, the frequency set appearing in the Message P={4/10,1/10,3/10,2/10}

We construct a Huffman tree by using the character in the character set as the leaf node and the frequency as the weight value.

where each node corresponds to one character, the edges in T are marked, the left branch is "0" and the right branch is labeled "1". The encoding of the defined character is from the root node to the word represented descriptor.

On the path of the leaf node, the sequence of the markers on each edge is the Huffman code.

---- A's code: 0,C's Code: 10,D's Code: 110,B's code: 111.

Obviously for any character set, such a binary tree can always be constructed. Since no other leaf nodes will appear on any path from the root node to a leaf node,

So the encoding obtained by this method must be the prefix code, by traversing the binary tree, we can find the encoding of each character.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.