Reprint Please specify the Source: http://blog.csdn.net/luoshixian099/article/details/50374452

Do not build a plateau on a floating sand

Huffman algorithm is also a lossless compression algorithm, but unlike the LZW compression algorithm in the previous article, Huffman need to get a priori knowledge of the probability of each character appearing. By calculating the frequency of each character in a sequence of characters, a unique coding design is made for each character, so that the characters with high frequency have a short number of digits, and the frequency is low, to achieve the purpose of compression. Typically, you can save 20%~90% space and rely heavily on the characteristics of your data! Huffman encoding is a variable-length encoding, which means that the encoding length for each character is not unique.

Prefix code: The encoding of any one character is not a prefix of another character encoding in the same character set. Huffman encoding is the optimal prefix code, that is, the minimum amount of data after compression.

---------------------------------------------------------------------------------------------------------------

Huffman algorithm:

1. Count the frequency of each character of the character sequence, and establish a node for each character, the node weight is its frequency;

2. Initialize the minimum priority queue and insert all of the above nodes into the queue;

3. Remove The first two symbol nodes from the priority queue and remove them from the priority queue;

4. Create a new parent node, and put the above two nodes as their left and right children node, the parent node's weights are the sum of the left and the next nodes;

5. If the priority queue is empty at this point, exit and return a pointer to the parent node! Otherwise, insert the parent node into the priority queue and repeat step 3;

--------------------------------------------------------------------------------------------------------------- -

through the construction of the Huffman tree, you can see that each character node is a leaf node, encoding method: from the root node to the left to define the code ' 0 ', to the right defined as ' 1 ', traverse to the leaf node to get the two-value code string, that is, the encoding value of the character. Because the character code word is the prefix code, in the decoding process, each kind of character may refer to the Huffman tree to be the unique decoding, but the prefix code disadvantage is, the error has the dissemination function, when has 1 bit code word error, thereafter the decoding process is likely to be incorrect;

Code implementation:

/*csdn not in the floating sand building http://blog.csdn.net/luoshixian099 data compression--huffman encoding December 21, 2015 */#include <iostream> #include <vector> #include "compress.h" using namespace Std;void showcode (pnode root, vector<char> &code); int Main () {char a[] = "XXZNXZNNVVCCNCVZZBZZVXXCZBZVMNZVNNZ";//raw data uint Length = sizeof (A)-1; Priority_q Queue (A, Length); Establish priority queue//output frequency for each set of characters for (UINT i = 0; I <= queue. heap_size;i++) {cout << (char) (Queue.table[i]->key) << "Frequency:" << queue.table[i]-> Frequency << Endl;} cout << "--------------------" << Endl; Pnode root = build_huffman_tree (queue);//build Huffman tree vector<char> code; ShowCode (root, code); Show encoded data return 0;} void ShowCode (Pnode root,vector<char> &code) {if (root!=null) {if (Root->_left = = NULL && root ->_right = = NULL)//leaf node {cout << (char) (root->key) << "code:"; for (UINT i = 0; i < code.size (); i++) {cout << (int) code[i]; }cout << Endl; Return } code.push_back (0); ShowCode (Root->_left,code); Code[code.size ()-1] = 1; ShowCode (Root->_right,code); Code.resize (Code.size ()-1); }}

/*compress.cpp*/#include "compress.h" Priority_q::P riority_q (char *a,int Length)//statistics on the frequency of various characters {for (int i = 0; i <; i+ +) {Table[i] = new Node;} Heap_size = 0;for (int i = 0; i < Length; i++)//statistic character Frequency {bool Flag = true;for (int j = 0; J < Heap_size; J + +) { if (Table[j]->key = = * (a+i)) {table[j]->frequency = table[j]->frequency + 1; Flag = false; Break }}if (Flag)//Add new character {Table[heap_size]->key = * (A + i); table[heap_size]->frequency = table[heap_size]-> Frequency + 1; heap_size++;}} heap_size--; Build_min_heap (heap_size); Establishes the priority queue}void Priority_q::build_min_heap (UINT Length) {for (int i = (int) (LENGTH/2); I >= 0; i--) {min_heapify (i);}} void priority_q::min_heapify (UINT i) {uint Smaller = i; UINT left = 2 * i + 1; UINT right = 2 * i + 2;if (left <= heap_size && table[left]->frequency < table[i]->frequency)//Determine if small To its child's value {Smaller = left;} if (right <= heap_size && table[right]->frequency < table[smaller]->frequency) {Smaller = right;} if (Smaller! = i)//if it is less than, swap position {swap (I, Smaller) with the largest child in it; Min_heapify (Smaller);}} void Priority_q::swap (int x, int y)//Exchange data for two elements {Pnode temp = table[x];table[x] = table[y];table[y] = temp;} Pnode Copynode (Pnode _src, Pnode _dst)//copy data {_dst->frequency = _src->frequency;_dst->key = _src->key;_dst- >_left = _src->_left;_dst->_right = _src->_right;return _dst;} Pnode priority_q::extract_min ()//Output queue first node {if (heap_size = = EMPTY) return null;if (heap_size = 0) {heap_size = Empty;retur n table[0];} if (heap_size >= 0) {Swap (heap_size, 0); heap_size--; Min_heapify (0);} return table[heap_size+1];} void Priority_q::insert (Pnode pnode)//insertion of priority queue {heap_size++; Copynode (Pnode, table[heap_size]);d elete Pnode; UINT i = heap_size;while (i > 0 && table[parent (i)]->frequency > Table[i]->frequency) {Swap (I, Parent (i)); i = Parent (i);}} Pnode Build_huffman_tree (priority_q &queue)//build Huffman Tree {Pnode parent=null,left=null,right=null;while (queue. Heap_size! = EMPTY) {left = new Node;right = new node;parent = new Node; Copynode (queue. Extract_min (), left); Remove two elements of Copynode (queue. Extract_min (), right);//copy left and right node data parent->frequency = Right->frequency + left->frequency;//Establish parent node parent->_ left = Left;parent->_right = right;if (queue. Heap_size = = EMPTY) break;queue. Insert (parent); Insert back into priority queue}return parent;}

/*compress.h*/#ifndef compress#define compress#include <vector> #define UINT unsigned int #define UCHAR unsigned Char #define EMPTY 0xffffffff#define Parent (i) (UINT) (((i)-1)/2) typedef struct NODE /Node {node::node (): Key (EMPTY), Frequency (0), _left (null), _right (null) {}uint key; UINT frequency;struct Node * _left;struct node * _right;} Node,*pnode;class priority_q //Priority Queue {public:priority_q (char *a, int Length); void Insert (Pnode pnode);//Insert Pnode Extract_min (); remove element uint Heap_size; The length of the queue Pnode table[256]; Establish 256 kinds of nodes private:void Build_min_heap (UINT Length); Establishes a queue void Swap (int x, int y); Exchange two elements void Min_heapify (UINT i); Maintain the nature of the priority queue}; Pnode Build_huffman_tree (priority_q &queue);//Build Priority queue #endif//COMPRESS

Reference:

Http://wenku.baidu.com/view/04a8a13b580216fc700afd2e.html

http://blog.csdn.net/abcjennifer/article/details/8020695

Huffman principle and code implementation of "data compression"