Huffman tree (optimal binary tree) and its Java implementation

Source: Internet
Author: User

First, the definition

Some definitions:

    • Path length between nodes: The branch that goes through the tree from one node to another, forming the number of branches on the path between the two nodes, called its path length.
    • Tree path length: the sum of the length of the path from the root node of the tree to each node in the tree. In a two-fork tree with the same number of nodes, the path length of the complete binary tree is shortest.
    • node Right : In some applications, a real number that gives a certain meaning to the node in the tree.
    • The belt-weighted path length of a node: the product of the length of the path between the node and the root and the right on the node.
    • the length of the tree with weighted path (Weighted path length of tree:WPL): defined as the sum of the weighted path lengths of all leaf nodes in the tree

In the following two-fork tree, the weights of the leaf nodes are 5, 6, 2, 4, 7, and the weighted path length is calculated:

    • Optimal binary tree: A tree is formed from a combination of the given target-weighted nodes (individual nodes) in a way. Minimize the weight of the tree. The optimal binary tree is a two-fork tree with the shortest length of the weighted path. According to the number of nodes and the difference of weights, the shape of the optimal binary tree is different. What they have in common is that the node with the weighted value is the leaf junction point. The smaller the value of the node, the longer the path to the root node.

For example, given 4 leaf nodes a,b,c and D, respectively, with weights 7,5,2 and 4. Constructed as shown in the three binary trees (and many trees), their weighted path lengths are:

(a) wpl=7*2+5*2+2*2+4*2=36
(b) Wpl=7*3+5*3+2*1+4*2=46
(c) Wpl=7*1+5*2+2*3+4*3=35

where (c) The WPL of the tree is the smallest, can be verified, it is Huffman tree.

Note:
① the weights on the leaves are the same, the complete binary tree must be the optimal binary tree, otherwise the complete binary tree is not necessarily the optimal binary tree.
② optimal binary tree, the greater the weight of the leaves from the root closer.
The morphology of ③ optimal binary tree is not unique and the WPL is minimal.

Second, Tectonic Huffman tree

1) According to the given n weights {w1, W2, W3, w4......wn} constitute n binary tree forest f={t1, T2, T3 ..... Tn}, where each binary tree has only one weight of the root node of WI, and its left and right subtree are empty;

2) in the forest F, select two root nodes of the minimum weight of two fork tree, as a new two fork tree of the left and right subtree, and the new two fork tree root node of the weight of its left and right sub-tree weights and;

3) Remove the selected two subtrees trees from F and add the new two-fork tree to the F forest;

4) Repeat 2, 3 operation, until the forest contains only a binary tree, at this time to get this binary tree is Huffman tree.

The construction process is as follows:

Third, Java implementation

Create Huffman tree for the specified node:

Package Com.liuhao.datastructures;import Java.util.arraydeque;import Java.util.arraylist;import java.util.List; Import Java.util.queue;public class Huffmantree {public static class Node<e> {E data;double weight; Node Leftchild; Node Rightchild;public node (E data, double weight) {super (); this.data = Data;this.weight = weight;} Public String toString () {return "node[data=" + Data + ", weight=" + weight + "]";}} public static void Main (string[] args) {list<node> nodes = new arraylist<node> (); Nodes.Add (New Node ("A", 40.0 ) Nodes.Add (New node ("B", 8.0)), Nodes.Add ("C", 10.0), Nodes.Add (New node ("D", 30.0)), Nodes.Add (New node ("E") , 10.0)); Nodes.Add (New Node ("F", 2.0)); Node root = Huffmantree.createtree (nodes); System.out.println (Breadthfirst (root));}  /** * Construct Huffman Tree * * @param nodes * Node Set * @return The root node of the Huffman tree constructed */private static nodes Createtree (list<node> nodes) {//As long as there are more than 2 nodes in the nodes array while (Nodes.size () > 1) {quickSort (nodes);//Two nodes with the lowest weighted value node left = nodEs.get (Nodes.size ()-1); Node right = Nodes.get (Nodes.size ()-2);//Generate a new node, the weight of the new node is the sum of the weights of the two child nodes and node parent = new node (null, Left.weight + right.weight) ;//Let the new node be the parent node of the two weights minimum node parent.leftchild = left;parent.rightchild = right;//Delete the two nodes with the lowest weight value nodes.remove (Nodes.size ()-1); Nodes.remove (Nodes.size ()-1);//Adds a new node to the collection Nodes.Add (parent);} Return Nodes.get (0);}  /** * Exchange elements at the I and J indices in the specified collection * * @param nodes * @param i * @param j */private static void Swap (list<node> nodes, int i, Int j) {Node tmp;tmp = Nodes.get (i); Nodes.set (I, Nodes.get (j)); Nodes.set (J, TMP);} /** * Implements a fast sorting algorithm for sorting nodes * * @param nodes * @param start * @param end */private static void Subsort (list<node> node s, int start, int end) {if (Start < end) {//with the first element as the cutoff value node base = Nodes.get (start);//I search from the left, searching for an index of an element greater than the cutoff value int i = STA rt;//J Search from the right, search for the index of the element that is less than the cutoff value int j = end + 1;while (true) {//Find the index of the element greater than the cutoff value, or I have reached end while (I < end && NODES.G ET (++i). Weight >= base.weight);//Find the index of an element less than the cutoff value, or J is already at start while (J > Start &&amP Nodes.get (--j). Weight <= base.weight), if (I < j) {Swap (nodes, I, j);} else {break;}} Swap (nodes, start, j);//recursive left sub-sequence Subsort (nodes, start, j-1);//recursive Right sub-sequence Subsort (nodes, J + 1, End);}} public static void QuickSort (list<node> nodes) {subsort (nodes, 0, Nodes.size ()-1);} Breadth first traverse public static list<node> Breadthfirst (Node root) {queue<node> Queue = new arraydeque<node> (); list<node> list = new arraylist<node> (); if (root!=null) {//Adds the root element to "queue" queue.offer (root);} while (!queue.isempty ()) {//Adds the queue's "tail" element to list List.add (Queue.peek ()); Node P = queue.poll ();//If the left Dial hand node is not NULL, add it to the queue if (p.leftchild! = null) {Queue.offer (p.leftchild);} If the right child node is not NULL, add it to the queue if (p.rightchild! = null) {Queue.offer (p.rightchild);}} return list;}}
The key steps in the above code include:

(1) Sort all the nodes in the list collection;

(2) Find the two nodes with the smallest weights in the list set;

(3) To create a new node as a child node with two nodes with the lowest weight;

(4) Remove the two nodes with the least weight from the list collection and add the new node to the list collection

The program uses loops to perform the above steps until only one node is left in the list collection, and the last remaining node is the root node of the Huffman tree.

Four, Huffman code

According to Huffman tree can solve the problem of message coding. Suppose you need to encode a string, such as "Abcdabcaba", and convert it to a unique binary code, but the length of the binary code required to convert it is minimal.

Assuming that each character appears in the string in the frequency of W, its encoded length is l, encoded characters N, then the total length of the encoded binary code is W1L1+W2L2+...+WNLN, which is exactly the Huffman tree processing principle. Therefore, the structure principle of Huffman tree can be used for binary coding, which makes the message length shortest.

For "Abcdabcaba", there are A, B, C, D4 characters, the number of occurrences are 4, 3, 2, 1, the equivalent of their weights, A, B, C, D to the occurrence of the number of times as the weight of the construction Huffman tree, to get the results of the left-hand image.

From the Haffman node, assign code "0" to the left dial hand tree, assign "1" to the right subtree, and reach the leaf node. Then, from the root of the tree along each path to the leaf node of the code arranged to get each leaf node Huffman code, as shown in the right.

As can be seen, A, B, C, D corresponds to the encoding of 0, 10, 110, 111, and then the string "Abcdabcaba" into the corresponding binary code is 0101101110101100100, the length is only 19. This is the shortest binary encoding, also known as Huffman coding.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.