Huffman Tree Application--File compression

Source: Internet
Author: User
Tags assert int size readline uncompress

1. Huffman Tree Introduction:
Huffman tree (Huffman trees), also known as the optimal tree, refers to a given n weights as the leaf node of N, the construction of a binary tree, if the length of the path to the smallest, called such a two-fork tree is the optimal binary tree, also known as Huffman Tree (Huffman tree )。 Huffman tree is a tree with the shortest length of the weighted path, and the nodes with larger weights are closer to the root. If you assign a node in a tree to a value that has a meaning, the value is called the node's right.
2. Huffman tree is constructed as follows:
For example, there is a group of numbers, 1,4,3,5,2,6,7, according to the definition of Huffman tree, we first find the smallest of the 2 data in these trees, as a leaf node, with their two to construct their father node, The value of the Father node is the sum of the weights of the 2 leaf nodes, then the father back to the original data, pick out the 2 smallest to continue to build, and so on, and so on, until all the data are taken out we construct him as shown in the figure below:

Each time it takes 2 of the smallest data out of the data, But how can we get the smallest 2 data? This will be used before I introduced the heap, heap has the smallest heap and the largest heap, here Huffman tree to use the smallest heap form, the data to build a small heap, each time take a minimum, take 2 times, build Huffman tree.
Implementation code for the heap I've already explained, here's another look at the implemented code:

#pragma once #include <iostream> #include <vector> #include <cassert> using namespace std;  Template<typename t> struct Max {bool operator () (const t& x1, const t& x2) {return x1 >
    X2;
}
};  Template<typename t> struct Min {bool operator () (const t& x1, const t& x2) {return X1 <
    X2;
}
}; 
    Template <typename T,typename compare=min<t>> class Heap {public:heap () {} heap (t* A, size_t sz)
        {for (int i = 0; i < sz; i++) {_a.push_back (a[i]);
        }//Build heap for (int i = (sz-2)/2; I >= 0; i) {_adjustdown (i) from the parent node of the last leaf node;
        }} void Push (const t& x) {_a.push_back (x);
    _adjustup (_a.size ()-1);
        } void Pop () {assert (_a.size ());
        Swap (_a[0], _a[_a.size ()-1]);
        _a.pop_back ();
    _adjustdown (0); } const t& Top () {
        return _a[0];
    } size_t Size () {return _a.size ();
    } bool Empty () {return _a.empty ();
        } protected:void _adjustdown (int parent) {int child = parent * 2 + 1;
        int size = _a.size ();
        Compare com;
            while (Child < size) {if (child + 1 < size&&com (_a[child + 1], _a[child]))
            {++child;
                } if (COM (_a[child], _a[parent])) {swap (_a[child], _a[parent]);
                parent = child;
            Child = parent * 2 + 1;
            } else {break;
        }}} void _adjustup (int child) {int parent = (child-1)/2;
        Compare com;
                while (children >= 0) {//Only consider the size of the father and child, regardless of sibling size if (COM (_a[child], _a[parent])) {
             Std::swap (_a[child],_a[parent]);   Child = parent;
            Parent = (child-1)/2;
            } else {break;
}}} protected:vector<t> _a; };

Next is the code that generates the Huffman tree, which applies to the heap problem

#pragma once #include <iostream> #include "Heap.h" using namespace std;
    Template<typename t> struct Huffmantreenode {T _weight;
    Huffmantreenode* _parent;
    Huffmantreenode* _left;

    huffmantreenode* _right; Huffmantreenode (const t& weight): _weight (weight), _parent (null), _left (null), _rig
HT (NULL) {}};
        Template<typename T> class Huffmantree {typedef huffmantreenode<t> Node; Public:huffmantree ()
            : _root (NULL) {} huffmantree (t* A, size_t size, t& invalid) {//copy function struct Min {
            BOOL Operator () (node* T1, node* T2) {return (T1->_weight) < (t2->_weight);
        }
        };
        Build small heap heap<node*, min> min_heap; for (size_t i = 0; i < size; i++) {if (a[i]! = invalid) {node* Newnod
                E = new Node (A[i]); Min_heap. PuSH (newNode); }} if (Min_heap. Size () = = 1) {_root = Min_heap.
        Top (); } while (Min_heap. Size () >1) {node* left = min_heap.
            Top (); Min_heap.
            Pop (); node* right = Min_heap.
            Top (); Min_heap.

            Pop ();
            node* parent = new Node (left->_weight + right->_weight);
            Parent->_left = left;
            Parent->_right = right;
            Left->_parent = parent;
            Right->_parent = parent; Min_heap.
            Push (parent);
        _root = parent;
    }} node* Getroot () {return _root;
        } ~huffmantree () {assert (_root);
        _huffmantreedestory (_root);
    _root = NULL; } protected:void _huffmantreedestory (node* root) {if (root = NULL) {_huffmantreede
            Story (Root->_left);
            _huffmantreedestory (Root->_right); Delete Root
        root = NULL;
}} node* _root; };

3. Next I will start to explain to you how file compression is going.
File compression is to make the file smaller, save storage space, using Huffman tree to achieve the principle of file compression is to make it occupy less memory, but how to make its storage space smaller. For example, save a file AAAABBBCCD

1> first count the number of occurrences of each character, and then use the number of these characters to construct Huffman tree, constructed Huffman tree we can see, each leaf node is the number of characters appear, see a better explanation of the picture:

2> a character occupies a byte of storage space, to find a way to reduce the memory it occupies, we can use Huffman code to achieve, that is, from the root node, left we set to 0, to the right, the rule is 1, each path is not 1 is 0, until the root node, the formation of the character corresponding Huffman code , as shown in figure:

3> implementation of the compression settings of the file, we will each character generated Huffman code, each code is only occupied 1 bits of memory, we will every 8 bits for a group, write to the compressed file, and finally not enough 8 bits to shift operations, so as to ensure that all the encoding is continuous.
The code in the example is:

The last encoding has 6 bits, and the need to move 2 bits to the left to ensure that there is no problem with the decompression.
4> need to generate a configuration file to hold each character, and the number of occurrences of each character
5> to know that there are only 256 characters in the file, so we can define an array to store the characters, and the same array cannot make the normal array, it should be a struct array, each element of the array should contain characters, the number of characters appearing, and the corresponding encoding of the characters, as defined below:

Character Information
typedef long long longtype;
struct Charinfo
{
    unsigned char _ch;//character
    Longtype _count;//character The number of occurrences in a file
    string _code;//Huffman code

    Charinfo (Longtype count=0)
        : _ch (0)
        , _count (count)
    {}
    Charinfo operator + (const charinfo& info)
    {
        return charinfo (_count + info._count);
    }
    BOOL operator < (const charinfo& info)
    {
        return _count < Info._count;
    }
    BOOL operator > (const charinfo& Info)
    {
        return _count>info._count;
    }
    BOOL operator! = (const charinfo& info)
    {
        return _count! = Info._count;
    }
    BOOL operator = = (const charinfo& info)
    {
        return _count = = Info._count;
    }
};

6> implementation decompression is the inverse of the compression process, from the configuration file to get the number of characters appearing, rebuilding Huffman tree, starting from the first character of the compressed file, compared to each of the second binary, is 0 words, starting from the root node, left one step, is 1, from the root node to the right step, Compare 8 times the leaf node has not been found, then get the characters from the compressed file until the leaf node is found, and the character information of the leaf node is written to the extracted file.

The next step is the complete code.

Class Filecompress {typedef huffmantreenode<charinfo> Node; Public:filecompress () {for (int i = 0; I < 256;
        i++) {_info[i]._ch = i;
        }} string Compress (const char* filename) {//1. The number of occurrences of a statistical character assert (filename);
        Binary mode read file* Fout = fopen (filename, "RB");
        ASSERT (Fout);
        Read the characters in the file int ch = fgetc (fout);
            while (ch! = EOF) {_info[ch]._count++;
        ch = fgetc (fout);
        }//2. Generate Huffman tree Charinfo invalid;

        Huffmantree<charinfo> Tree (_info, 256,invalid);
        3. Generate Huffman encoded string code;
        Makehuffmancode (Tree.getroot (), code);
        node* ROOT = Tree.getroot ();
        cout << "Compression" << root->_weight._count<< "characters" << Endl;
        4. Compress string compressfilename = filename; Compressfilename.append (".
        Compress "); file* fin = fopen (CompRessfilename.c_str (), "WB");//Binary mode write fseek (fout,0,seek_set);
        ch = fgetc (fout);
        unsigned char value = 0;
        int pos = 0;
            while (ch! = EOF) {//code only 0 and 1 code = _info[ch]._code;
                    for (size_t i = 0; i < code.size (); i++) {if (code[i] = = ' 1 ') {
                    Value <<= 1;
                Value |= 1;
                    } else {value <<= 1;
                Value |= 0;
                } pos++;
                    if (pos = = 8) {FPUTC (Value,fin);
                    pos = 0;
                Value = 0;
        }} ch = fgetc (fout);
            } if (pos! = 8) {value <<= (8-pos);
        FPUTC (value, fin); } cout << "Name of compressed file:" << filename << Endl
        Fclose (Fout);
        Fclose (Fin);
        5. Create the configuration file, where the tree characters are stored and the number of characters appears in string configfilename = filename;
        Configfilename.append (". config");
        file* finconfig = fopen (Configfilename.c_str (), "WB");
        String line;
        Char buff[128]; for (int i = 0; i < i++) {if (_info[i]._count! = 0) {FPUTC (_info
                [I]._ch, Finconfig];
                Line + = ', ';
                _itoa (int) _info[i]._count, Buff, 10);
                Line + = Buff;
                Line + = ' \ n ';
                Fputs (Line.c_str (), finconfig);
            Line.clear ();
        }} fclose (Finconfig);
    return compressfilename;
        }//6. Unzip the string uncompress (const char* filename) {assert (filename);
        Open the configuration file string name = filename;
        size_t _index = Name.rfind ('. ');
        String configfilename = Name.substr (0, _index); Configfilename += ". config";
        file* foconfigname = fopen (Configfilename.c_str (), "RB");
        cout << "Unzip file name:" << name << Endl;
        Rebuild the _info hash table string line;
            while (ReadLine (Foconfigname, line)) {unsigned char ch = line[0];
            _info[ch]._ch = line[0];
            Longtype count = Atoi (Line.substr (2). C_STR ());
            _info[ch]._count = count;
        Line.clear ();
        }//Unzip file name string compressfilename = filename;
        size_t index = Compressfilename.rfind ('. ');
        String uncompressfilename = Compressfilename.substr (0,index); Uncompressfilename.append (".
        Uncompress ");
        Generate Huffman tree Charinfo invalid;
        Huffmantree<charinfo> Tree (_info, invalid);
        Open the file to be decompressed file* fout = fopen (filename, "RB");
        Create an extract file and write file* fin = fopen (Uncompressfilename.c_str (), "WB");
        node* root = Tree.getroot (); node* cur = root;
        Longtype count = root->_weight._count;
        cout << Unzip << count << "characters" << Endl; There is only one character in the file, no Huffman encoding, special handling/*if (Cur->_left = = Null&&cur->_right = = NULL) {Whil
            E (count--) {FPUTC (Root->_weight._ch,fin);
        } return uncompressfilename;
        }*///File only one character in the case unsigned readch = fgetc (Fout);
            if (READCH = = 0) {while (count--) {FPUTC (root->_weight._ch, Fin);
        } return uncompressfilename;
        } int pos = 7; while (readch! = EOF) {if (pos >= 0) {if ((READCH >> POS) &am P
                1) = = 1) {cur = cur->_right;
                } else {cur = cur->_left;
               } --pos; if (Cur->_left = = Null&&cur->_right = = NULL) {FPUTC (cur->_weight._ch
                    , Fin);
                    cur = root;
                Find a leaf node, the number of decompressed characters reduced by one--count;
                } if (count = = 0) {break;
                }} else {pos = 7;
            READCH = fgetc (Fout);
    }} return uncompressfilename;
        } protected:void Makehuffmancode (node* root, String code) {if (root = NULL) return; else if (Root->_left = = Null&&root->_right = = NULL) {_info[(root->_weight). _ch]._c
        Ode = code; } else {Makehuffmancode (root->_left, code + ' 0 ');//left-run encoding plus 0 Makehuffmancode (roo T->_right, code + ' 1 ');//Right-walk encoding plus 1}//read a line of characters BooL ReadLine (file* filename, string& line) {int ch = 0;
                while (ch=fgetc (filename) = EOF) {if (ch = = ' \ n ' &&line.size ()! = 0) {
            return true;
        } line + = ch;
    } return false;
} protected:charinfo _info[256];
};
    void Test () {//compression filecompress F1;
    int begin = GetTickCount ();
    String Cf1 = F1.compress ("Haha.txt");
    String Cf1 = F1.compress ("photo.jpg");
    String Cf1 = F1.compress ("music.m4a");
    int end = GetTickCount ();

    cout << "Compression time:" << end-begin << Endl;
    Decompression filecompress F2;
    int begin1 = GetTickCount (); String UnCf1 = F2.
    Uncompress ("haha.txt.Compress"); String UnCf1 = F2.
    Uncompress ("photo.jpg.Compress"); String UnCf1 = F2.
    Uncompress ("Music.m4a.Compress");
    int end1 = GetTickCount ();
cout << "Decompression time:" << end1-begin1 << Endl; }

The

result is:

generated in the file corresponding compressed files, configuration files, and extract files,

Unzip the file with the same size as the source file, the compressed file is smaller than the source file, the same can be used for picture, video, audio testing, here I do not test each.
It uses a lot of knowledge about the file and the knowledge of string, substr () is a string copy function, you can specify the number of characters copied, C_STR () returns a pointer to a string, and some other functions, file compression synthesizes a lot of knowledge, To understand Huffman tree and file compression of the implementation of ideas, the ability to reform their own mastery.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.