Java implementation __.net of Huffman coding algorithm

Source: Internet
Author: User

Directory (?) [+] Introduction to Huffman coding

Huffman encoding is the binary encoding pairing of characters and characters, which is divided into encoding and decoding to compress the binary data length of characters. We know that character storage and transmission are binary (the computer only knows 0/1), then there is the mapping relationship between the character and the binary. Characters belong to character sets (Charset), characters need to be encoded (encode) for binary storage and transmission, the display needs to decode (decode) back characters, character set and encoding method is a one-to-many relationship ( Unicode can be encoded with utf-8,utf-16, etc.). Understand the character set, coding and decoding, flying garbled problem also on the edge of the solution. For example, in the ASCII encoding, the decimal is 97 and the binary is 01100001. Each character in ASCII is encoded with 8 bit (1Byte), and if 1000 characters are to be transmitted, then 8,000 bit is transmitted. The problem is that the letter E is used in English with a frequency of 12.702%, and Z is 0.074%, the former is 100 times the latter, but it does use the same number of digits in the binary. Can do better, the method is variable length coding, the guiding principle is high frequency with a shorter number of digits encoded, low frequency with a longer digit code. Huffman coding algorithm is to deal with such problems. Huffman coded Java implementation

The data structure used mainly in Huffman coding algorithm isComplete binary trees (full binary)and priority queues. The latter is java.util.PriorityQueue, the former implementation (all internal classes), the code is as follows:[Java] View Plain Copy static class tree {            private node root;              public  node getroot ()  {                return root;           }              public void setroot (Node root)  {                this.root = root;            }       }          static class Node implements Comparable<Node>  {           private String chars =  "";            private int frequence = 0;           private Node parent;            private Node leftNode;            private Node rightNode;                @Override            public int  CompareTo (node n)  {                return frequence - n.frequence;            }              public boolean isleaf ()  {               return  Chars.length ()  == 1;           }               public boolean isroot ()  {                return parent == null;            }               Public boolean isleftchild ()  {                return parent != null && this ==  parent.leftnode;           }               public int getfrequence ()  {                return frequence;           &nBSP;}               public void setfrequence (int  frequence)  {                this.frequence = frequence;           }              public string getchars ()  {                return chars;           }               public void setchars (string chars)  {                this.chars = chars;            }              public  Node getparent ()  {                return parent;           }               public void setparent (node parent)  {               this.parent = parent;            }               public node getleftnode ()  {                return leftNode;            }              public void  Setleftnode (node leftnode)  {                this.leftnode = leftnode;           }              public node getrightnode ()  {                return rightNode;            }               public void setrightnode (node rightnode)  {                this.rightNode = rightNode;            }       }  Statistical Data

Since you have to arrange the coding table by frequency, then of course you have to get statistical information about frequency. I have implemented a method to deal with such problems. If you already have statistical information, then turn to map<character,integer>. If the information you get is a percentage, multiply by 100 or 1000, or 10000. can always be converted to integers. For example, 12.702% times 1000 for 12702,huffman encoding is only concerned with size issues. Statistical methods are implemented as follows:[Java] View Plain copy public static map<character, integer> statistics (Char[] chararray)  {           map<character, integer> map  = new HashMap<Character, Integer> ();            for  (Char c : chararray)  {                character character = new character (c);                if  (Map.containskey ( character))  {                    map.put (Character, map.get (character)  + 1);                } else {           &nbSp;        map.put (character, 1);                }            }              return map;       }  Build Tree

The construction tree is the core step of Huffman coding algorithm. The idea is to hang all the characters to a singlecomplete two-fork treeLeaf node, the left node of any one non-page child node appears less frequently than the right node. The algorithm is used to store statistics into node in a priority queue. Each time a two-min node is ejected from the queue, a new parent node (non-leaf nodes) is created, the sum of the two-node characters that the character content has just popped out of, and the frequency is their sum, the first to bounce out as the Zoozi node, The latter is the right child node and the newly built parent node is placed in the queue. Repeat the action N-1 times, N is the number of different characters (the number of each queue minus 1). To complete the above steps, there is one node left in the queue, which pops up as the root node of the tree. The code is as follows:[Java] View Plain copy private static tree buildtree (Map<character, integer> statistics,                List<Node>  Leafs)  {           Character[] keys =  Statistics.keyset (). ToArray (New character[0]);               PriorityQueue<Node> priorityQueue = new PriorityQueue<Node> ( );           for  (character character :  Keys)  {               node node  = new node ();                node.chars = character.tostring ();              &nbSp; node.frequence = statistics.get (character);                priorityqueue.add (node);                leafs.add (node);            }              int size =  Priorityqueue.size ();           for  (int i =  1; i <= size - 1; i++)  {                node node1 = priorityqueue.poll ();                Node node2 =  Priorityqueue.poll ();                   Node sumNode = new node ();                sumnode.chars = node1.chars + node2.chars;                sumNode.frequence = node1.frequence + node2.frequence;                    sumnode.leftnode = node1;                sumNode.rightNode = node2;                   node1.parent = sumNode;                node2.parent = sumNode;                  priorityqueue.add (SumNode);            }              tree tree  = new tree ();           tree.root =  priorityqueue.poll ();           return tree;        }  Coding

A character corresponds to the encoding, from the leaf node where the character is searched up, if the character node is the left node of the parent node, before the encoding character plus 0, if it is the right node, plus 1, until the root node. As long as the mapping relationship between the character and the binary code is obtained, the encoding is very simple. The code is as follows:[Java] View Plain copy public static string encode (string originalstr,                map<character, integer> statistics)  {           if  (originalstr == null  | |  originalstr.equals (""))  {                return  "";           }               char[] charArray =  Originalstr.tochararray ();           List<Node>  Leafnodes = new arraylist<node> ();            buildtree (statistics, leafnodes);           map <character, string>&nbSp;encodinfo = buildencodinginfo (leafnodes);               stringbuffer buffer = new stringbuffer ();            for  (Char c : chararray)  {                Character character = new  Character (c);               buffer.append (Encodinfo.get (character));           }               return buffer.tostring ();        }   [Java] View Plain copy Private static map<character, string> buildencodinginfo (List<Node > leafnodes)  {           Map<Character,  String> codewords = new hashmap<character, string> ();            for  (node leafnode : leafnodes)  {                character character =  new character (Leafnode.getchars (). CharAt (0));                String codeword =  "";                Node currentNode = leafNode;                  do {                    if  (Currentnode.isleftchild ())  {                         codeword =  "0"  + codeword;                    } else {                         codeword =  "1"  + codeword;                    }                       currentNode = currentNode.parent;               &n

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.