File compression and decompression for Huffman applications

Source: Internet
Author: User
Tags uncompress

File compression and decompression >

Recently this time has been learning the tree of this data structure, also touched the Huffman tree and understand what Mulao is Huffman encoding, and our commonly used zip compression is also the use of Huffman encoding features, that Mulao is not able to implement a file compression it? Yes, of course. In file compression I implemented the code for the Huffman tree and heaps heap, the introduction of zip compression >

http://www.cricode.com/3481.html

The following starts to introduce their own implementation of the file compression ideas and problems ...

1). statistics > read a file counts the number of occurrences of characters in this file.

2). Achievements > the number of characters appearing as the weights using the greedy algorithm to build Huffman tree (according to the characteristics of the Huffman tree > The number of characters will be close to the root node, the number of occurrences of a few points away from the root node).

3). Generate Huffman encoding > rules left 0 right 1.

4). Compress > read the file again, compressing the file according to the generated Huffman encoding.

5). generate the configuration file > write the number of characters and occurrences of the character into the configuration file.

6). Unzip > Use the configuration file to restore the Huffman tree and restore the original file based on the compressed file.

7). Test > Determine whether the correct decompression needs to determine whether the original file and after the file is the same, using beyond compare software to compare.

Here is a simple example I gave, simulating the process of compression and decompression, hoping that readers will be helpful

Compare > with beyond Compare software

In the implementation of a lot of problems, the following I put forward a few easy to commit problems, for reference only

1). When constructing a Huffman tree using a greedy algorithm, if we store it in a byte unsigned char for a total of 2^8=256 characters, if all the characters are built Huffman tree, this not only reduces the efficiency but also allocates great memory. So I set up a The concept of illegal values is only built into the Huffman tree if the number of characters appears not to be 0.

2). When we write a compressed file, we convert the Huffman encoding of the character to the corresponding bit, and then write to the compressed file whenever we fill a byte (8 bits). If the last byte is not filled, we will empty out the position of the displaced position, Fill 0 of the full position with one byte before writing to the compressed file.

3). If we compress the source files and then unzip them, because your Huffman tree and Huffman code are all obtained in the compression function, then unzip the Mulao your Huffman code and how to restore the file. That's why Mulao to generate the configuration file , the characters are written in the configuration file, and the number of characters that occur. Builds a new Huffman tree from the configuration file in decompression .

4). How do you know how many characters are pressed when restoring a file from a compressed file? That is, because when we compress the last byte is a complement of 0, when extracting the bit as a character encoding may be processed. One idea is to set a variable when counting the number of occurrences of a character, add 1 for each character read, and then write the variable into the configuration file Another idea is that , based on the weights of the root nodes, the _count of the root node weights is the number of occurrences of a character by observing the simple example above.

Solve the above problems, my program has been able to compress the 256 characters and correctly restored, that Mulao if it is a large file or Chinese characters, pictures and audio video?

1). Because of some special character encoding, so we count the number of occurrences of the character should be unsigned char, just at the beginning I used the end of the file symbol is EOF in Asii its code is-1 at this time it is not possible to use EOF to determine whether the file is over, So I used the feof function to determine whether the file ended.

2). Statistical character occurrences should be of type long Long, which solves the problem of compressing and decompressing large files.

3). Because Chinese characters, pictures, videos These are in binary form in memory, so we will open the read or write as text to read or write in binary form .

In order to verify the compression of large files I looked for a 8.09M file after compression is 6.50M and can be restored correctly.

1). Test efficiency >

2). Use beyond Compare software to compare, if the compression succeeds >

#define _crt_secure_no_warnings 1#pragma once#include "Heap.h" Template<class t>struct huffmantreenode{t _weight ; Huffmantreenode<t> *_left; Huffmantreenode<t> *_right; Huffmantreenode<t> *_parent; Huffmantreenode (const t& w=t ()): _weight (W), _left (null), _right (null), _parent (null) {}};template<class t> Class Huffmantree{typedef huffmantreenode<t> Node;public:huffmantree (): _root (NULL) {}huffmantree (const T* A, size_t size): _root (NULL) {_root=_creathuffmantree (a,size);} Filter out non-appearing characters, do not construct heap huffmantree (const t* a,size_t size,const t& Invalid) {_root=_creathuffmantree (a,size,invalid);} node* Getroot () {return _root;} ~huffmantree () {_destroy (_root);} Protected:node *_creathuffmantree (const t* a,size_t size) {struct Nodeless{bool operator () (Node *l,node *r) Const{return L->_weight < r->_weight;}}; Heap<node *,nodeless> minheap;//establishes the node and puts it into the vector for (size_t i=0;i<size;++i) {node *tmp=new node (a[i]); Minheap.push (TMP);} Take out the smaller two nodes as left and right children and build the parent node while (minheap.siZe () > 1) {Node *left=minheap.top (); Minheap.pop (); Node *right=minheap.top (); Minheap.pop (); Node *parent=new node (left->_weight + right->_weight);p arent->_left=left;parent->_right=right;left- >_parent=parent;right->_parent=parent;minheap.push (parent);} return Minheap.top ();} The idea is similar to node *_creathuffmantree without filtering (const t* a,size_t size,const t& Invalid) {struct nodeless{bool operator () (node *l,node *r) Const{return L->_weight < r->_weight;}}; Heap<node *,nodeless> minheap;//establishes the node and puts it into the vector for (size_t i=0;i<size;++i) {if (a[i]! = invalid) {Node *tmp=new Node (A[i]); Minheap.push (TMP);}} Remove the smaller two nodes as the left and right child and build the parent node while (Minheap.size () > 1) {Node *left=minheap.top (); Minheap.pop (); Node *right=minheap.top (); Minheap.pop (); Node *parent=new node (left->_weight + right->_weight);p arent->_left=left;parent->_right=right;left- >_parent=parent;right->_parent=parent;minheap.push (parent);} return Minheap.top ();} void _destroy (Node *&root) {if (root = = NULL) retuRN; Node *cur=root;if (cur) {_destroy (cur->_left); _destroy (cur->_right);d elete Cur;cur=null;return;}} Protected:node *_root;}; void Testhuffmantree () {int a[]={0,1,2,3,4,5,6,7,8,9};int size=sizeof (a)/sizeof (a[0]); huffmantree<int> HT (a,size);}


 

#define _crt_secure_no_warnings 1#pragma once//uses the properties of the functor to realize the reusability of the code template<class t>struct Small{bool operator () ( Const t& L,const t& r) {return L < r;}}; Template<class t>struct large{bool operator (const t& L,const t& r) {return L > r;}}; Template<class t,class compare=large<t>>//default is to build a small heap class heap{public:heap () {}heap (const T *a,int size) { ASSERT (a); _a.reserve (size); for (int i=0;i<size;++i) {_a.push_back (a[i]);} When you build the heap, start with the last non-leaf node. for (int j= (size-2)/2;j>=0;--j) {_adjustdown (j);}} void Push (const t& x) {_a.push_back (x); _adjustup (_a.size ()-1);} void Pop () {assert (!_a.empty ()); Swap (_a[0],_a[_a.size ()-1]); _a.pop_back (); _adjustdown (0);} size_t Size () {return _a.size ();} BOOL Empty () {return _a.empty ();} Const t& TOP () Const{assert (!_a.empty ()); return _a[0];} void Display () {for (size_t i=0;i<_a.size (); ++i) {cout<<_a[i]<< "";} Cout<<endl;} Protected:void _adjustdown (int root) {int parent=root;size_t child=2*root+1;while (Child < _a. Size ()) {Compare Com;//child points to the larger number of children in the left and right//if (Child+1 < _a.size ()//&& _a[child+1] > _a[child]) if (child +1 < _a.size () && COM (_a[child+1],_a[child])) {child++;} if (_a[child] > _a[parent]) if (COM (_a[child],_a[parent])) {swap (_a[child],_a[parent]);p arent=child;// The initial child by default points to the left kid Child=2*parent+1;} else break;}} void _adjustup (int child) {when (Child > 0) {int parent= (child-1)/2; Compare com;//if (_a[child] > _a[parent]) if (COM (_a[child],_a[parent])) {swap (_a[child],_a[parent]); child=parent;} else//inserted data is smaller than the data field of the parent node;}} Protected:vector<t> _a;};/ /solve the problem of priority queue with heap Template<class t,class compare=large<t>>class priorityqueue{public:priorityqueue (int *a, int size): _PQ (a,size) {}void Push (const t& x) {_pq. Push (x);} void Pop () {_pq. Pop ();} Const t& TOP () Const{return _PQ. Top ();} void Display () {_pq. Display ();} Protected:heap<t,compare> _PQ; };


#define _crt_secure_no_warnings 1#pragma once#include "HuffmanTree.h" typedef long Long Type;struct charinfo{unsigned     Char _ch;           The character type _count appears;          Number of statistics string _code; Huffman encoded Charinfo (Type count=0): _ch (0), _count (count), _code ("") {}//overloads corresponding operators Charinfo operator + (const CHARINFO & FC) Const{return Charinfo (_count + fc._count);} BOOL operator! = (Const Charinfo FC) Const{return _count! = Fc._count;} BOOL operator < (const charinfo& FC) Const{return _count < Fc._count;}}; Class filecompress{public://default constructor filecompress () {for (size_t i=0;i<256;++i) {_infos[i]._ch=i;}} String Compress (const char *filename) {assert (filename); FILE *pf=fopen (filename, "RB"), assert (PF), unsigned char ch=fgetc (PF),//Count character occurrences while (!feof (PF)) {_infos[ch]._count ++;CH=FGETC (PF);}   Constructs a huffmantree.charinfo invalid with the number of occurrences of the character; Illegal value huffmantree<charinfo> HT (_infos,256,invalid);//Generate Huffman encoded string Code;_creathuffmancode (HT. Getroot (), code);//_creathuffmancode (HT. Getroot ());//Compressed file fseek (pF,0,seek_set);   Back to file header string Compressfile=filename;compressfile + = ". Compress";                  Compressed file name file *fin=fopen (Compressfile.c_str (), "WB"), Assert (Fin), size_t pos=0; The number of record bits unsigned char value=0;ch=fgetc (PF), while (!feof (PF)) {string &code=_infos[ch]._code;for (size_t i=0;i<    Code.size (); ++i) {Value <<= 1;if (code[i] = = ' 1 ') value |= 1;elsevalue |= 0; Do-nothing++pos;if (pos = = 8)//full byte {FPUTC (Value,fin); value=0;pos=0;}} CH=FGETC (PF);} if (POS)//resolve less than 8 bits of the case. {Value <<= (8-pos); FPUTC (Value,fin);} Configuration file--Easy to rebuild Huffman tree string Configfilename=filename;configfilename + = ". config"; FILE *finconfig=fopen (Configfilename.c_str (), "WB"), Assert (Finconfig), String Line;char buff[128];for (size_t i=0;i <256;++i) {//a row of Read if (_infos[i]._count) {line + = _infos[i]._ch;line + = ","; line + = _itoa (_infos[i]._count,buff,10); Line + = "\ n";//fputs (Line.c_str (), finconfig); Fwrite (Line.c_str (), 1,line.size (), finconfig); Line.clear ();}} Fclose (PF); fclose (Fin); fclose (finconfig); RetuRN Compressfile;} String uncompress (const char *filename) {assert (filename); string configfilename=filename;size_t index= Configfilename.rfind ("."); Configfilename=configfilename.substr (0,index); Configfilename + = ". config"; File *foutconfig=fopen (Configfilename.c_str (), "RB"), assert (Foutconfig), string line;//read configuration file--Gets the number of characters that appear unsigned Char Ch=0;while (ReadLine (foutconfig,line)) {if (Line.empty ()) {line + = ' \ n '; continue;} Read the blank line ch=line[0];_infos[ch]._count = Atoi (Line.substr (2). C_STR ()); Line.clear (); Build Huffman tree Charinfo Invalid; Huffmantree<charinfo> HFT (_infos,256,invalid);//The weighted value of the root node is the sum of the number of occurrences of the character huffmantreenode<charinfo> *root= Hft. Getroot (); Type charcount=root->_weight._count;//Unzip the string Uncompressfilename=filename;index=uncompressfilename.rfind ("."); Uncompressfilename=uncompressfilename.substr (0,index); Uncompressfilename + = ". Uncompress"; File *fin=fopen (Uncompressfilename.c_str (), "WB"), Assert (Fin), or//restore files by compressed file string Compressfilename=filename; FILE *fout=fopen (Compressfilename.c_str (), "RB "); assert (Fout); Huffmantreenode<charinfo> *cur=root;int pos=7;ch=fgetc (Fout), while (CharCount > 0) {while (cur) {if (cur->_ left = = NULL && Cur->_right = = null) {//leaf node FPUTC (cur->_weight._ch,fin); Cur=root;--charcount;if ( CharCount = = 0)//All characters are processed to complete break;}    if (Ch & (1 << POS))//check characters for each bit cur=cur->_right;     1 Go right elsecur=cur->_left; 0 Go left--pos;if (POS < 0)//A Byte extract complete {ch=fgetc (fout);p os=7;}}} Fclose (Foutconfig); fclose (Fin); fclose (fout); return uncompressfilename;} Reads a line of characters and places the bool ReadLine (FILE *fout,string& lines) {int ch=fgetc (fout) in lines, and if (ch = = EOF) return false;while (ch! = EOF && ch = ' \ n ') {line + = Ch;ch=fgetc (fout);} return true;} protected://recursive method for Huffmantreecodevoid _creathuffmancode (huffmantreenode<charinfo> *root,string &code) { if (root = NULL) return; _creathuffmancode (root->_left,code+ ' 0 '); _creathuffmancode (root->_right,code+ ' 1 '); if (Root->_left = = NULL && Root->_right = = null)//leafnode {_infos[root->_weight._ch]._code=code;}} Non-recursive huffmantreecodevoid _creathuffmancode (huffmantreenode<charinfo> *root) {if (root = NULL) return; _ Creathuffmancode (Root->_left); _creathuffmancode (root->_right); if (Root->_left = = NULL && root-> _right = = NULL)//leaf knot {string& code=_infos[root->_weight._ch]._code; Huffmantreenode<charinfo> *cur=root; Huffmantreenode<charinfo> *parent=root->_parent;while (parent) {if (Parent->_left = = cur) code.push_back (    ' 0 ');    Left 0elsecode.push_back (' 1 '); Right 1cur=parent;parent=cur->_parent;} The encoding is from root to leaf node and needs to be reversed reverse (code.begin (), Code.end ());}} Protected:charinfo _infos[256];}; void Testfilecompress () {filecompress fc;cout<< "start compression" <<endl;cout<< "Compress Time:"; int start=gettickcount    (); Fc.compress ("2.png"); Input input. BIG 3.mp3int end=gettickcount ();cout<<end-start<<endl;cout<< "Start extracting" <<endl;cout<< " When the solution is reduced: "; Start=gettickcount (); FC. Uncompress ("2.png.compress"); Input.Compress input.BIG.compress 3.mp3end=gettickcount (); Cout<<end-start<<endl;} void Testfileuncompress () {filecompress FC; cout<< "Start decompression" <<endl;cout<< "shrink time:"; int start= GetTickCount (); FC. Uncompress ("2.png"); int end=gettickcount (); Cout<<end-start<<endl;}


After testing this small project has been able to compress and restore some files, there is no Mulao big bug, if there are children shoes found please tell me the first time oh ...

File compression and decompression for Huffman applications

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.