Detailed Huffman coding algorithm of Java implementation _java

Source: Internet
Author: User
Tags character set data structures int size static class stringbuffer

Introduction to Huffman coding

Huffman encoding is the binary encoding pairing of characters and characters, which is divided into encoding and decoding to compress the binary data length of characters. We know that character storage and transmission are binary (the computer only knows 0/1), then there is the mapping relationship between the character and the binary. Characters belong to character sets (Charset), characters need to be encoded (encode) for binary storage and transmission, the display needs to decode (decode) back characters, character sets and encoding method is a one-to-many relationship (Unicode can be encoded with utf-8,utf-16, etc.). Understand the character set, coding and decoding, flying garbled problem also on the edge of the solution. For example, in the ASCII encoding, the decimal is 97 and the binary is 01100001. Each character in ASCII is encoded with 8 bit (1Byte), and if 1000 characters are to be transmitted, then 8,000 bit is transmitted. The problem is that the letter E is used in English with a frequency of 12.702%, and Z is 0.074%, the former is 100 times the latter, but it does use the same number of digits in the binary. Can do better, the method is variable length coding, the guiding principle is high frequency with a shorter number of digits encoded, low frequency with a longer digit code. Huffman coding algorithm is to deal with such problems.

Huffman coded Java implementation

The main data structures used in Huffman coding algorithms are complete binary trees (full binary tree) and priority queues. The latter is Java.util.PriorityQueue, the former implementation (all internal classes), the code is as follows:

Static class Tree {private Node root; 
    Public Node Getroot () {return root; 
    public void Setroot (Node root) {this.root = root; 
    } Static class Node implements Comparable<node> {private String chars = ""; 
    private int frequence = 0; 
    Private Node parent; 
    Private Node Leftnode; 
 
    Private Node Rightnode; 
    @Override public int CompareTo (Node N) {return frequence-n.frequence; 
    public boolean isleaf () {return chars.length () = 1; 
    public Boolean isRoot () {return parent = = NULL; 
    public Boolean isleftchild () {return parent!= null && this = = Parent.leftnode; 
    public int getfrequence () {return frequence; 
    The public void setfrequence (int frequence) {this.frequence = frequence; 
    Public String GetChars () {return chars; } public void SetChars (String chars) {this.chars = chars; 
    Public Node GetParent () {return parent; 
    public void SetParent (Node parent) {this.parent = parent; 
    Public Node Getleftnode () {return leftnode; 
    The public void Setleftnode (Node leftnode) {this.leftnode = Leftnode; 
    Public Node Getrightnode () {return rightnode; 
    The public void Setrightnode (Node rightnode) {this.rightnode = Rightnode; 
 } 
  }

Statistical data

Since you have to arrange the coding table by frequency, then of course you have to get statistical information about frequency. I have implemented a method to deal with such problems. If you already have statistical information, then turn to map<character,integer>. If the information you get is a percentage, multiply by 100 or 1000, or 10000. can always be converted to integers. For example, 12.702% times 1000 for 12702,huffman encoding is only concerned with size issues. Statistical methods are implemented as follows:

public static Map<character, integer> statistics (char[] chararray) { 
    map<character, integer> Map = new Ha Shmap<character, integer> (); 
    for (char c:chararray) { 
      Character Character = new Character (c); 
      if (Map.containskey (character)) { 
        map.put (character, Map.get (character) + 1); 
      } else { 
        map.put (character, 1); 
      } 
 
    return map; 
  } 

Build tree

The

Build tree is the core step of the Huffman encoding algorithm. The idea is to hang all the characters to a leaf node of a completely binary tree, and the left node of any one non-page child node does not appear more frequently than the right node. The algorithm is used to store statistics into node in a priority queue. Each time a two-min node is ejected from the queue, a new parent node (non-leaf nodes) is created, the sum of the two-node characters that the character content has just popped out of, and the frequency is their sum, the first to bounce out as the Zoozi node, The latter is the right child node and the newly built parent node is placed in the queue. Repeat the action N-1 times, N is the number of different characters (the number of each queue minus 1). To complete the above steps, there is one node left in the queue, which pops up as the root node of the tree. The code is as follows:

private static Tree Buildtree (Map<character, integer> statistics, list<node> Leafs) {character[] 
 
    Keys = Statistics.keyset (). ToArray (new character[0]); 
    priorityqueue<node> priorityqueue = new priorityqueue<node> (); 
      for (Character Character:keys) {node node = new node (); 
      Node.chars = Character.tostring (); 
      Node.frequence = Statistics.get (character); 
      Priorityqueue.add (node); 
    Leafs.add (node); 
    int size = Priorityqueue.size (); 
      for (int i = 1; I <= size-1 i++) {Node Node1 = Priorityqueue.poll (); 
 
      Node Node2 = Priorityqueue.poll (); 
      Node Sumnode = new node (); 
      Sumnode.chars = Node1.chars + node2.chars; 
 
      Sumnode.frequence = node1.frequence + node2.frequence; 
      Sumnode.leftnode = Node1; 
 
      Sumnode.rightnode = Node2; 
      Node1.parent = Sumnode; 
 
      Node2.parent = Sumnode; 
    Priorityqueue.add (Sumnode); Tree tree =New Tree (); 
    Tree.root = Priorityqueue.poll (); 
  return to tree; 
 }

Coding

A character corresponds to the encoding, from the leaf node where the character is searched up, if the character node is the left node of the parent node, before the encoding character plus 0, if it is the right node, plus 1, until the root node. As long as the mapping relationship between the character and the binary code is obtained, the encoding is very simple. The code is as follows:

public static string encode (string originalstr, Map<character, integer> statistics) {if (Originalstr = = NULL | | 
    Originalstr.equals ("")) {return ""; 
    } char[] Chararray = Originalstr.tochararray (); 
    list<node> leafnodes = new arraylist<node> (); 
    Buildtree (statistics, leafnodes); 
 
    Map<character, string> encodinfo = Buildencodinginfo (leafnodes); 
    StringBuffer buffer = new StringBuffer (); 
      for (char c:chararray) {Character Character = new Character (c); 
    Buffer.append (Encodinfo.get (character)); 
  return buffer.tostring (); private static Map<character, string> Buildencodinginfo (list<node> leafnodes) {map<character, Stri 
    ng> codewords = new Hashmap<character, string> (); 
      for (Node leafnode:leafnodes) {Character Character = new Character (Leafnode.getchars (). CharAt (0)); 
      String codeword = ""; 
 
    Node CurrentNode = Leafnode;  do {if (Currentnode.isleftchild ()) {codeword = "0" + codeword; 
        else {codeword = "1" + codeword; 
      } CurrentNode = Currentnode.parent; 
 
      while (currentnode.parent!= null); 
    Codewords.put (character, codeword); 
  return codewords; 
 }

Decoding

Because the Huffman encoding algorithm guarantees that any binary code will not be prefixed by another code, decoding is very simple, sequentially remove each of the binary, from the root down search, 1 to the right, 0 to the left, to the leaf node (hit), return to the root node to repeat the above action. The code is as follows:

public static string decode (string binarystr, Map<character, integer> statistics) {if (Binarystr = = Nu ll | | 
    Binarystr.equals ("")) {return ""; 
    } char[] Binarychararray = Binarystr.tochararray (); 
    linkedlist<character> binarylist = new linkedlist<character> (); 
    int size = Binarychararray.length; 
    for (int i = 0; i < size; i++) {Binarylist.addlast (new Character (binarychararray[i))); 
    } list<node> leafnodes = new arraylist<node> (); 
 
    Tree tree = Buildtree (statistics, leafnodes); 
 
    StringBuffer buffer = new StringBuffer (); 
 
      while (Binarylist.size () > 0) {node node = tree.root; 
        do {Character c = Binarylist.removefirst (); 
        if (c.charvalue () = = ' 0 ') {node = Node.leftnode; 
        else {node = Node.rightnode; 
 
      } while (!node.isleaf ()); 
    Buffer.append (Node.chars); Return BUFFER.TOSTRING (); 
 }

Testing and comparison

The following tests Huffman the correctness of the encoding (first coded, after decoding, including Chinese), and the Huffman encoding compared to the common character encoded binary strings. The code is as follows:

public static void Main (string[] args) {String oristr = "Huffman codes compress data very effectively:savings of 2 0% to 90% are typical, "+" depending on the characteristics of the the data being compressed. 
    China's rise "; 
    Map<character, integer> statistics = statistics (Oristr.tochararray ()); 
    String encodedbinaristr = encode (oristr, statistics); 
 
    String decodedstr = decode (encodedbinaristr, statistics); 
    System.out.println ("Original sstring:" + oristr); 
    System.out.println ("Huffman encoed binary string:" + encodedbinaristr); 
 
    System.out.println ("Decoded string from Binariy string:" + decodedstr); 
    System.out.println ("Binary string of UTF-8:" + getstringofbyte (oristr, Charset.forname ("UTF-8")); 
    System.out.println ("Binary string of UTF-16:" + getstringofbyte (oristr, Charset.forname ("UTF-16")); 
System.out.println ("Binary string of Us-ascii:" + getstringofbyte (oristr, Charset.forname ("Us-ascii"));    System.out.println ("Binary string of GB2312:" + getstringofbyte (oristr, Charset.forname ("GB2312")); 
      public static string Getstringofbyte (String str, Charset Charset) {if (str = NULL | | str.equals ("")) { 
    Return ""; 
    } byte[] ByteArray = Str.getbytes (charset); 
    int size = Bytearray.length; 
    StringBuffer buffer = new StringBuffer (); 
      for (int i = 0; i < size; i++) {byte temp = bytearray[i]; 
    Buffer.append (Getstringofbyte (temp)); 
  return buffer.tostring (); 
    public static String Getstringofbyte (Byte b) {StringBuffer buffer = new StringBuffer (); 
      for (int i = 7; I >= 0; i--) {Byte temp = (byte) ((b >> i) & 0x1); 
    Buffer.append (string.valueof (temp)); 
  return buffer.tostring (); 

 }

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.