[Data Compression] Huffman principle and code implementation

Source: Internet
Author: User

[Data Compression] Huffman principle and code implementation
Do not build high platforms in the sand float

The Huffman algorithm is also a lossless compression algorithm, but unlike the LZW compression algorithm in the previous article, Huffman requires a prior knowledge of the probability of occurrence of each character. By calculating the frequency of occurrence of each character in a character sequence, a unique encoding design is performed for each character to make the number of characters with high frequencies short and the length of characters with low frequencies long, to achieve the goal of compression. It can save 20% ~ 90% of the space depends heavily on the data features! The Huffman encoding is variable-length encoding, that is, the encoding length of each character is not unique.

Prefix: any character encoding is not the prefix of another character encoding in the same character set. The Huffman encoding is the optimal prefix, that is, the minimum data size after compression.

Bytes ---------------------------------------------------------------------------------------------------------------

Huffman algorithm:

1. Calculate the frequency of each character in the Character Sequence and create a node for each character. The node weight is its frequency;

2. initialize the least priority queue and insert all the above nodes into the queue;

3. Retrieve the first two symbol nodes of the priority queue and delete them from the priority queue;

4. Create a parent node and use the preceding two nodes as its left and right child nodes. The weight of the parent node is the sum of the left and right nodes;

5. If the priority queue is empty at this time, exit and return the pointer of the parent node! Otherwise, insert the parent node to the priority queue and repeat Step 3;

Bytes ----------------------------------------------------------------------------------------------------------------

Through the preceding Huffman tree, we can see that each character node is a leaf node. The encoding method is to define the encoding '0' from the root node to the left, and to the right as '1 ', the binary string obtained by traversing the leaf node, that is, the encoding value of this character. Because the character code is a prefix, during decoding, each character can be uniquely decoded by referring to the Huffman tree. However, the disadvantage of the prefix code is that the error has the propagation function, when one codeword is incorrect, subsequent decoding may be incorrect;

Code implementation:

 

/* Do not use CSDN to build a high-end server in Fusha. http://blog.csdn.net/luoshixian099data compression --huffmanencoding December 21, 2015 */# include
 
  
# Include
  
   
# Include "compress. h" using namespace std; void ShowCode (PNode root, vector
   
    
& Code); int main () {char A [] = "xxznxznnvvccncvzzbzzvxxczbzvmnzvnnz"; // original data UINT Length = sizeof (A)-1; Priority_Q queue (A, Length ); // create a priority queue // output the frequency of each group of characters for (UINT I = 0; I <= queue. heap_Size; I ++) {cout <(char) (queue. table [I]-> key) <"Frequency:" <queue. table [I]-> Frequency <endl;} cout <"------------------" <endl; PNode root = Build_Huffman_Tree (queue); // construct the Huffman tree vector
    
     
Code; ShowCode (root, code); // display the encoded data return 0;} void ShowCode (PNode root, vector
     
      
& Code) {if (root! = NULL) {if (root-> _ left = NULL & root-> _ right = NULL) // leaf node {cout <(char) (root-> key) <"code:"; for (UINT I = 0; I <code. size (); I ++) {cout <(int) code [I] ;}cout <endl; return;} code. push_back (0); ShowCode (root-> _ left, code); code [code. size ()-1] = 1; ShowCode (root-> _ right, code); code. resize (code. size ()-1 );}}
     
    
   
  
 

/* Compress. cpp */# include "compress. h "Priority_Q: Priority_Q (char * A, int Length) // calculate the frequency of various characters {for (int I = 0; I <256; I ++) {table [I] = new Node;} Heap_Size = 0; for (int I = 0; I <Length; I ++) // statistics Character Frequency {bool Flag = true; for (int j = 0; j <Heap_Size; j ++) {if (table [j]-> key = * (A + I )) {table [j]-> Frequency = table [j]-> Frequency + 1; Flag = false; break ;}} if (Flag) // Add a new character {table [Heap_Size]-> key = *( A + I); table [Heap_Size]-> Frequency = table [Heap_Size]-> Frequency + 1; Heap_Size ++ ;}} Heap_Size --; Build_Min_Heap (Heap_Size ); // create a priority queue} void Priority_Q: Build_Min_Heap (UINT Length) {for (int I = (int) (Length/2); I> = 0; I --) {Min_Heapify (I) ;}} void Priority_Q: Min_Heapify (UINT I) {UINT Smaller = I; UINT Left = 2 * I + 1; UINT Right = 2 * I + 2; if (Left <= Heap_Size & table [Left]-> Frequency <table [I]-> Fre Quency) // determine whether the value is Smaller than the value of the Child {Smaller = Left ;} if (Right <= Heap_Size & table [Right]-> Frequency <table [Smaller]-> Frequency) {Smaller = Right;} if (Smaller! = I) // if the value is Smaller than the value, change the value to {Swap (I, Smaller); Min_Heapify (Smaller) ;}} void Priority_Q: Swap (int x, int y) // exchange the data of two elements {PNode temp = table [x]; table [x] = table [y]; table [y] = temp ;} PNode CopyNode (PNode _ src, PNode _ dst) // copy data {_ dst-> Frequency = _ src-> Frequency; _ dst-> key = _ src-> key; _ dst-> _ left = _ src-> _ left; _ dst-> _ right = _ src-> _ right; return _ dst;} PNode Priority_Q: Extract_Min () // output queue frontend node {if (Heap_Size = EMPTY) return NULL; if (Heap_Size = 0) {Heap_Size = EMPTY; return table [0];} if (Heap_Size> = 0) {Swap (Heap_Size, 0 ); heap_Size --; Min_Heapify (0);} return table [Heap_Size + 1];} void Priority_Q: Insert (PNode pnode) // Insert to the priority queue {Heap_Size ++; copyNode (pnode, table [Heap_Size]); delete pnode; UINT I = Heap_Size; while (I> 0 & table [Parent (I)] -> Frequency> table [I]-> Frequency) {Swap (I, Parent (I); I = Parent (I) ;}} PNod E Build_Huffman_Tree (Priority_Q & queue) // create a Huffman tree {PNode parent = NULL, left = NULL, right = NULL; while (queue. Heap_Size! = EMPTY) {left = new Node; right = new Node; parent = new Node; CopyNode (queue. extract_Min (), left); // extract two elements CopyNode (queue. extract_Min (), right); // copy the left and right node data parent-> Frequency = right-> Frequency + left-> Frequency; // create the parent node parent-> _ left = left; parent-> _ right = right; if (queue. heap_Size = EMPTY) break; queue. insert (parent); // Insert back to priority queue} return parent ;}
/* Compress. h */# ifndef COMPRESS # define COMPRESS # include
 
  
# Define UINT unsigned int # define UCHAR unsigned char # define EMPTY 0 xFFFFFFFF # define Parent (I) (UINT) (I)-1)/2) typedef struct Node // Node {Node: Node (): key (EMPTY), Frequency (0), _ left (NULL), _ right (NULL) {} UINT key; UINT Frequency; struct Node * _ left; struct Node * _ right;} Node, * PNode; class Priority_Q // priority queue {public: Priority_Q (char * A, int Length ); void Insert (PNode pnode); // Insert PNode Extract_Min (); // retrieve the element UINT Heap_Size; // queue length PNode table [256]; // create 256 types of nodes private: void Build_Min_Heap (UINT Length); // create a queue void Swap (int x, int y ); // exchange two elements void Min_Heapify (UINT I); // maintain the nature of the priority queue}; PNode Build_Huffman_Tree (Priority_Q & queue); // construct the priority queue # endif // COMPRESS
 
Refer:

Http://wenku.baidu.com/view/04a8a13b580216fc700afd2e.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.