Huffman coding file Compression-Huffman tree construction and Coding __ encoding

Source: Internet
Author: User

"Problem description"

Write a program to compress a body file using Huffman encoding. The specific compression method is as follows:

1. The number of characters in the body file (except for the newline character ' \ ', not statistics) is counted according to the occurrence frequency

2. Generate the corresponding Huffman tree according to the character frequency (no characters are generated)

3. Huffman encoding for generating corresponding characters according to the Huffman Tree

4. Encode compressed files according to character Huffman (that is, output the source file characters sequentially according to the Huffman encoding).

Description

1. Only generate Huffman for the characters appearing in the file, note: Do not process \ n, that is, do not generate Huffman code for it.

2. A character with an ASCII value of 0 is used as the terminator of a compressed file (the number of occurrences can be set to a join encoding).

3. When the Huffman tree is generated, initially, when the character frequency weights are sorted (from small to large), the characters with the same frequency have the same ASCII encoding value, and the newly generated weights are inserted into the ordered weight sequence when the same weight occurs, and then the same weights are inserted into the following (using a stable sort).

4. To traverse the Huffman tree to generate the character Huffman code, the left side is 0 to the right 1.

5. The source file is a text file, the characters are encoded in ASCII, each character point 8 bits, and the use of Huffman encoding, high-frequency character encoding length is shorter (less than 8 bits), so the final output needs to use the C language in the bit operation of the character Huffman code sequentially output to each byte.

"Input Form"

Compresses the file Input.txt in the current directory.

"Output Form"

Outputs the compressed result to the file output.txt, while compressing the result in 16 (printf ("%x",...) Output to the screen to check and view the results.

"Sample Input 1"

If the current directory under the Input.txt content is as follows:

Aaabbc

"Sample Output 1"

15f0

At the same time, the program outputs the compressed results to the file output.txt.

"Sample description"

The frequency of characters in the input file is: A is 3,b to 2,c 1, in addition, the ' character ' is used as the closing flag for the compressed file, and the number of occurrences is set to 1. Therefore, using the Huffman code generation method, their Huffman codes are:

a:0

B:10

c:111

\0:110

Therefore, the final file compression result (bitwise) is:

0001010111110000

The results of this result in hexadecimal output to the screen are 15f0 (that is, the hexadecimal representation of 0001010 111110000).

Note: The output character sequence length of Huffman code is: 1+1+1+2+2+3+3=13 (bit), because the minimum unit of output in C language is byte (8 bits), therefore, the final complement of three bit 0, compressed actual output to 2 bytes. Because the text file is interpreted as ASCII, opening the compressed file as text will display garbled characters (preferably in the binary viewer).

"Sample Input 2"

If the current directory under the Input.txt content is as follows:

Do isn't spend all to you have.do not sleep as long as your want.

"Sample Output 2"

Ea3169146ce9eee6cff4b2a93fe1a5d462d21d9a87c0eb2f3eb2a9cfe6cae

At the same time, the program outputs the compressed results to the file output.txt.


The main ideas of producing Huffman tree:

Put all the nodes by weight (the number of occurrences) first with the linked list from small to large (ASCII code altogether 128, the list O (n) Search efficiency in fact, not much impact).

So each time with the first 2 nodes to generate a new node, the new node is the sum of the two nodes, and its two nodes is the first 2 nodes.

The new node is inserted into the list and the list is kept in order according to the title requirement, from which the first 2 nodes can be considered, which is equivalent to removing.

Starting from the 3rd node, so repeatedly until the list has only one node, then that node is the root node of the Huffman tree.


The main idea of encoding: (the member code of each node is an unsigned integer, using the first 16 bits of storage encoding length, the latter 16-bit storage encoding)

Starting at the root node, (if there is a child) each time the left child's code in the current recursive depth of the depth of the one dye 0, the right child is dyed 1, and then the left child and the right child down recursion. If the left and right children do not, the node is a leaf node, stored in ASCII, then also to store the depth to its code in the first 16 bit. At this time, depth embodies the length of the Huffman encoding, which is used to determine how many bits to read when encoding.


The main ideas of reading code writing code:

C file stream does not support a 1-bit write, so only a 8-bit write with Fputchar, set up a char variable Writein, starting with the original file to read the characters.

(*1) Each read a character, extracts the corresponding leaf node to store the encoded member, takes out the depth (first 16 bits) and the code (the latter 16 bits)

(*2) writes code from high to low one bit to Writein:

A) If the code is written (the number of digits written to depth), Writein not full 8 digits, back (*1)

b if writein full 8 digits, write once, Writein reset to 0, initialize high, back (*2)

Above, in order to meet the requirements of the topic, every time you write it in the way to print it. Note that when a computer prints a char variable, it prints it as an int, and if the most significant Bit is 1, the computer automatically complements the front all up to 1 because it thinks it's a negative complement ... For example, output 0XF1 display is 0xfffffff1, even if the provision of only 2 bytes output is not used ...

Online check, press "%HHX" output can solve this problem.

#include <stdio.h> #include <stdlib.h> struct Charnode {int count;  unsigned int code;
	The depth, the latter bits is the Huffmancode.
struct Charnode *lchild, *rchild, *next;
};

struct Charnode charnodes[128];
	void Insert_huffmannode (struct charnode *newnode) {struct Charnode *p = charnodes;
		while (P->next->count <= newnode->count) {if (p->next->next!=null) p = p->next;
			else {p->next->next = NewNode;
			p = NULL;
		Break
		} if (P!= NULL) {newnode->next = p->next;
	P->next = NewNode; } struct Charnode *build_huffmantree (struct Charnode *node1) {//merge Node1 and Node2 to form a new node, and insert T
	He node into the linked list struct Charnode *node2 = node1->next;
		if (Node2!= NULL) {struct Charnode *newnode = (struct Charnode *) malloc (sizeof (struct charnode));
		Newnode->count = Node1->count + node2->count;
		Newnode->code = 0; Newnode-&gT;lchild = Node1;
		Newnode->rchild = Node2;

		Newnode->next = NULL;
		Insert_huffmannode (NewNode);
	Return Build_huffmantree (Node2->next);
	else {return node1;  } void Linkup (struct Charnode *root) {//link up the existing nodes ' according to ' weight (increasing order) int i =
	0;
	struct Charnode *p = NULL;
	int min = 0x3f3f3f3f; for (; i < 128; i++) {if (charnodes[i].count!= 0 && charnodes[i].next = NULL && (charnodes + i)! = root) {//The next node should should appear at least once and is linked already if (Charnodes[i].count < mi
				N) {p = &charnodes[i];
			min = Charnodes[i].count; else if (charnodes[i].count = min) {//The next node, if there is several nodes with the same count, should be WI
				Th the Least sym if (&charnodes[i] <= p) {p = &charnodes[i];
	}}} Root->next = P;
if (P!= NULL) linkup (p);
} void Writecode (struct charnode *root, int depth) {	if (root!= NULL) {int f_left = 1, f_right = 1;
			if (root->lchild!= NULL) {root->lchild->code |= (root->code) << 1;
		F_left = 0;
			} if (Root->rchild!= NULL) {Root->rchild->code |= ((root->code) << 1) |1;
		f_right = 0;
			} if (f_left&&f_right) {//leaf node, leave a info about depth Root->code &= (1 << 16)-1;
			Root->code |= Depth << 16;
		The depth is stored in the ' the ' the ' the ' the ' the ', ' Writecode (Root->lchild, depth+1);
	Writecode (Root->rchild, depth+1);
	int main () {FILE *fin, *fout;
	Fin = fopen ("Input.txt", "R");
	if (fin = = NULL) exit (1);
	Fout = fopen ("Output.txt", "w");
	
	if (Fout = NULL) exit (1);
	Char probe;
	struct Charnode *hft;
	int i = 0;
		for (i = 0; i < 128 i++) {charnodes[i].count = 0;
		Charnodes[i].code = 0;
	Charnodes[i].next = NULL;
		while (probe = FGETC (Fin))!= EOF) {if (probe!= ' \ n ') {charnodes[probe].count++; }} CharNodes[0].count = 1;
	Linkup (Charnodes);
	HFT = Build_huffmantree (charnodes);

	Writecode (HFT, 0);
	Rewind (FIN);
	char Writein = 0;
	int bits = 0, depth = 0;
	for (;;)
		{probe = FGETC (FIN); If (probe!= EOF && probe!= ' \ n ') {depth = (Charnodes[probe].code) >>//code is unsigned, thus t He shifting are logical for (i = depth-1 i >= 0; i--) {//depth depicts how many bits are in the code char
				Code = ((Charnodes[probe].code) & (1 << i)) >> i;
				Writein |= Code << (7-bits);
				bits++;
					if (bits = = 8) {FPUTC (Writein, fout);
					printf ("%hhx", Writein);
					bits = 0;
				Writein = 0;  } else if (probe = EOF) {depth = (Charnodes[0].code) >>//code is unsigned, thus the shifting is logical for (i = depth-1 i >= 0; i--) {//depth depicts how many bits are in the code char = (ch
				Arnodes[0].code) & (1 << i)) >> i; Writein |= Code <<(7-bits);
				bits++;
					if (bits = = 8) {FPUTC (Writein, fout);
					printf ("%hhx", Writein);
					bits = 0;
				Writein = 0;
				} if (bits!= 0) {printf ("%hhx", Writein);
			FPUTC (Writein, fout);
		} break;
	} fclose (Fin);
	Fclose (Fout);
return 0; }


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.