Huffman Coding compression algorithm

Source: Internet
Author: User

You should have heard of David Huffman and his classic compression algorithm- Huffman Code, a compression algorithm that is performed by character frequency, priorityQueue, and two-fork tree , this binary tree is also called Huffman two tree-a tree with weights. But on the Internet, the Chinese community does not seem to have this algorithm very clear article, especially the structure of the tree, and just see a foreign article "a simpleExample of Huffman Code on A String", where the example is easy to understand, Pretty good, I just turned around. Note that I did not fully translate this article.

Let's look directly at the example if we need to compress the following string:

"Beep Boop beer!"

First, we calculate the number of occurrences of each character, and we get a table like this:


then, I put these things in the priority queue (with the number of occurrences as priority), we can see that the priorities queue is a prioirry sort of an array, if it is the same, will be sorted in the order in which they appear: here is the priority Queue we get:

The next step is our algorithm--turning this priority Queue into a binary tree. We always take two elements from the head of the queue to construct a binary tree (the first element is the left node, the second is the right node), and add the priority of the two elements and put it back in the order (note again that the priority here is the number of occurrences of the character), and then, We get the following data graph:

Again, we take the first two out to form a node with priority 2+2=4, and then put it back in the order queue:

Continue our algorithm (as we can see, this is a bottom-up process):

Eventually we'll get a binary tree like this:

At this point, we put the left branch of the tree encoded as 0, the right branch code of 1, so that we can traverse the tree to get the character encoding, such as: ' B ' encoding is XX, ' p\ ' encoding is 101, ' R ' encoding is 1000. we can see that the more the frequency will be in the upper layer, the shorter the coding, the less the frequency of the lower, the more the code is longer .

Finally, we can get the following code table:

One thing to note here is that when we encode, we encode,decode by bit, for example, if we have such a bitset "1011110111″ then it is" Pepe "after decoding. So, we need to build our Huffman encoded and decoded dictionary tables through this binary tree.

One thing to note here is that our Huffman does not conflict with the encoding of individual characters, that is, there is no prefix for another encoding, otherwise it would be a big problem. Because the encoding after encode is no delimiter.

So, for our original string beep Boop beer!

The binary to be able is: 0110 0010 0110 0101 0110 0101 0111 0000 0010 0000 0110 0010 0110 1111 0110 1111 0111 0000 0010 0000 0110 0010 0110 0101 0110 0101 0111 0010 0010 0001

Our Huffman codes are: 0011 1110 1011 0001 0010 1010 1100 1111 1000 1001

From the above example, we can see that the proportion of compression is still very considerable.

Huffman Coding compression algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.