Overview of BMP _gif_png_lzw_lz77

Source: Internet
Author: User

1. BMP

It adopts the bit ing storage format, and does not use any compression except the image depth;

The image depth of BMP files can be lbit, 4bit, 8bit, and 24bit. When a BMP file stores data, the image is scanned in the order of left to right and bottom to top.

When bibitcount = 1, 8 pixels constitute 1 byte;

When bibitcount = 4, 2 pixels constitute 1 byte;

When bibitcount = 8, 1 pixel occupies 1 byte;

When bibitcount = 24, one pixel occupies three bytes;

Ii. gif

GIF file data is a continuous-tone lossless compression format based on LZW algorithm. The compression ratio is generally about 50%;

The GIF image depth ranges from lbit to 8bit, that is, GIF supports up to 256 colors.

Another feature of the GIF format is that it can store multiple color images in a GIF file. If you read multiple image data in a file one by one and display it on the screen, to form the simplest animation.

GIF is mainly divided into two versions: GIF 89a and GIF 87a:

GIF 87a: it was developed in 1987.

GIF 89a: it was developed in 1989. In this version, four blocks including graphic control blocks, remarks, descriptions, and application programming interfaces are added to the GIF file, and support for transparent and multi-frame animations is provided;

GIF file format adopts an improved LZW compression algorithm, which is usually called GIF-LZW algorithm.

Image compression data is stored in the image compression data block after compression encoding according to the GIF-LZW. GIF-LZW coding is an improved LZW encoding method, which is a lossless compression encoding method. The GIF-LZW encoding method is to create a string table for the repeat string in the raw data, and then replace the original data with the index of the repeat string in the string table for compression purposes. Due to the need for GIF-LZW compression encoding, the minimum encoding length of the GIF-LZW must be first stored for the decoding program, and then the encoded image data is stored. The encoded image data is stored by data sub-blocks. The maximum length of each data sub-block is 256 bytes. The first byte of the Data sub-block specifies the length of the Data sub-block, and the following data is the content of the data sub-block. If the first byte value of a Data sub-block is 0, that is, the sub-block does not contain any useful data, the sub-block is called the block Terminator, it is used to identify the end of the Data sub-block.

3. PNG

When PNG is used to store a grayscale image, the depth of the grayscale image can be up to 16 bits. When the color image is stored, the depth of the color image can be up to 48 bits, it can also store up to 16-bit alpha channel data. PNG uses the lossless data compression algorithm derived from lz77.

Lz77 is also called "Sliding Window compression" in a sense, because it uses a virtual window that can follow the Sliding Process of the compression process as a term dictionary, if the string to be compressed appears in this window, the position and length of the string are output. Use a fixed-size window for term matching, instead of matching all the encoded information, because the time consumed by the matching algorithm is often large, and the dictionary size must be limited to ensure the efficiency of the algorithm; as the compression process slides the dictionary window, it always contains the recently encoded information, because for most information, the string to be encoded is often easier to find matching strings in the recent context.

The core algorithm of PNG compression is zip compression. This algorithm is characterized by the use of lz77 Algorithm for phrase-type repeated compression to obtain unmatched bytes and the combination of matching length and distance, then, based on the Huffman algorithm, compress individual bytes to obtain the compressed code stream. The principle of PNG decoding is the compression inverse process. During decoding, the original image data can be restored Based on the code table information and the compressed code stream.

Iv. lz77 Algorithm

In a sense, lz77 is also called "Sliding Window compression", because it
You can follow the Sliding Window of the compression process as the term dictionary. If the string to be compressed is in this window
And output its position and length. Use a fixed-size window to match terms, instead
There is a matching in the encoded information, because the time consumed by the matching algorithm is often a lot, and the dictionary must be limited
The size can ensure the efficiency of the algorithm. As the compression process slides the dictionary window, it always includes the latest encoding.
Because for most of the information, the string to be encoded is often more
It is easy to find matching strings.

Let's familiarize ourselves with the basic process of the lz77 algorithm.

1. Check unencoded data from the current compression position and try to find the longest horse in the sliding window.
Configure the string. If it is found, perform Step 2. Otherwise, perform step 3.
2. Output a three-element symbol group (Off, Len, c ). Among them, "off" indicates the window edge that matches the string relative to the window.
Field offset. Len can be a matched length. c is the next character. Then slide the window backward Len + 1
Characters, continue step 1.
3. Output a three-element symbol group (0, 0, c ). C is the next character. Then slide the window backwards.
Len + 1 characters, continue step 1.

We will describe it with examples. Suppose the window size is 10 characters, and we have just encoded 10 Characters
Yes: abcdbbccaa. The character to be encoded is abaeaaabaee.

First, we found that the longest string that can match the character to be encoded is AB (Off = 0, Len = 2), AB
The next character of is a, And We output three tuples: (0, 2,)

The window now slides three characters backward, and the content in the window is dbbccaaaba.

The next character e does not match in the window. We output three tuples: (0, 0, E)

The window slides one character backward, with the content changed to bbccaaabae.

We immediately found that the aaabae to be encoded exists in the window (Off = 4, Len = 6), followed by the word
E, we can output: (4, 6, E)

In this way, the strings that can be matched are converted into pointers to the window.
Data compression.

The decompression process is very simple, as long as we maintain the sliding window as we compress, with
Continuous input, we find the corresponding matching string in the window, with the subsequent character C output (if off and
If Len is 0, only the subsequent character C is output.) The original data can be restored.

V. LZW

1. What is the full name of LZW?
Lempel-Ziv-Welch (LZW ).
2. What is LZW introduction and compression principle?
The LZW compression algorithm is a novel compression method created by lemple-Ziv-Welch and named after them. It adopts an advanced string table compression, placing each first appearing string in a string table and using a number to represent the string. The compressed file stores only numbers, but not strings, this greatly improves the compression efficiency of image files. It is amazing that the string table can be correctly created during compression or decompression. After compression or decompression, the string table is discarded.
In the LZW algorithm, a string table is created, each string that appears for the first time is put into the string table, and expressed with a number. This number is related to the position of the string in the string table, and save the number to the compressed file. If the string appears again, it can be replaced by a number that represents it and stored in the file. After compression, the string table is discarded. For example, for a "print" string, if it is expressed as 266 during compression, as long as it appears again, it is expressed as 266, and the "print" string is stored in the string table. When decoding an image, the number 266 is displayed, the string "print" represented by string 266 can be found from the string table. during decompression, the string table can be regenerated Based on the compressed data.
3. Before giving a detailed introduction to an algorithm, list some concepts and vocabulary related to the algorithm.
1) 'character ': character, a basic data element. In a common text file, it occupies 1 Separate byte, while in an image, it is an index value that represents a given pixel color.
2) 'charstream': The volume stream in the data file.
3) 'prefix': prefix. Like the meaning of this word, it represents the most direct first character of a character. A prefix can contain 0 characters, a prefix, and a character ),
4) 'suffix ': suffix. It is a character. A string can be composed of (a, B). A is the prefix and B is the suffix. When a is 0, represents root, root
5) 'Code: code, used to represent the location encoding of a string
6) 'entry ', a code and the string it represents (string)
4. A simple example of the compression algorithm, not fully implementing the LZW algorithm, is just the idea of the LZW algorithm from the most intuitive perspective.
LZW compression on raw data abccaabcddaaccdb
The original data contains only four characters (character), A, B, C, D, which can be expressed in a 2bit number, 0-a, 1-B, 2-C, and 3-D, from the most intuitive point of view, the original string has a repeated character: abccaabcddaaccdb, with 4 representing AB and 5 representing cc. The above string can be replaced with 45a4cddaa5db, is this a little shorter than the original data!
5. Application Scope of LZW algorithm
In order to distinguish the value of the string and the original single data value (string), we need to make their numerical fields do not overlap, above 0-3 to represent the A-D, then AB must be replaced by a value greater than 3. In another example, the original value range can be represented by 8 bits, so the original number range is 0 ~ 255, the range of the number generated by the compression program cannot be 0 ~ 255 (if it is 0-255, it will be repeated ). It can only start from 256, but this will exceed the 8-bit representation range. Therefore, you must expand the number of data digits by at least one, but does this increase the space occupied by 1 character? However, a single character can be used to represent several characters. For example, if 255 is an 8-bit character, but now 256 is used to represent 254,255 two numbers, it is still possible. From this principle, we can see that the application scope of the LZW algorithm is that the original data string should have a large number of substrings repeated multiple times. The more duplicates, the better the compression effect. On the other hand, the worse it is, the more likely it will be.
6. Special mark in LZW algorithm
As new strings are constantly discovered, the numbers will also grow. If the original data is too large, the generated string table will become larger and larger, in this case, operations on this set will cause efficiency problems. How can we avoid this problem? GIF adopts the LZW algorithm. When the number set is large enough, it cannot be increased. It simply starts from the beginning and inserts a label at this position, that is, clearing the Mark Clear, indicates that from here I will re-construct the dictionary, and all previous tokens will be voided and new tokens will be used.
At this time, another problem occurs. How big is it? What is the proper size of this label set? Theoretically, the larger the number set, the higher the compression ratio, but the higher the overhead. It is generally determined based on the processing speed and memory space connection factors. The GIF specification specifies 12 characters. If the expression range of more than 12 characters is repeated, GIF uses a longer word length to increase the compression ratio. For example, if the original data is 8 bits, first add one digit. Then, the start length is 9 bits, and then add a label. When the number is increased to 512, that is, when the value of 9 is the maximum data that can be expressed, it means that the subsequent number must be expressed with 10 characters in length. From here on, the subsequent length is 10 characters. So far, when we reach 2 ^ 12, that is, 4096, We will insert a clear sign here, starting from the back and returning from 9 digits.
The value of the clear flag specified by GIF is the maximum value of the original data character Length plus 1. If the original data character length is 8, the clear mark is 256, if the size of the original data is 4, 16 is used. In addition, GIF also specifies an end sign. Its value is to clear the sign clear and Add 1. Because the number of digits specified by the GIF is 1 (monochrome), 4 (16 colors), and 8 (256 colors), if the number of digits is 1, only four States can be displayed. If one clear sign and the ending sign are used up, the first position must be expanded to three. In the other two cases, the initial character length is 5-bit and 9-bit. The http://blog.csdn.net/whycadi/ is referenced here
7. Sample Analysis of compressing original data using LZW algorithm
Input stream, that is, the original data is: 54,255, 24,255,255, 54 ..................
This shows how to compress a pixel array in a GIF file.
Because the raw data can be expressed in 8 bits, the clear mark is clear = 255 + 1 = 256, and the end mark is end = 256 + 1 = 257. Currently, the label set is
0 1 2 3 .................................... ........................................ ..... 255 clear end
Step 1: Read the first character 255 and search for it in the tag table. The value 255 already exists. We already know 255 and will not process it.
Step 2: Take the second character. At this time, the prefix is A, and the current entry is (, 24). If the mark set does not exist, we don't know 25th or 24. This time, your boy is here, I will remember you, mark it as 258 in the TAG set, then output Prefix A, retain suffix 24, and use it as the next prefix (suffix change prefix)
Step 3: Set the third character to 54. The current entry (24, 54) is not recognized. The record (24, 54) is marked as 259, and the output is 24. The suffix is changed to the prefix.
Fourth: Take the fourth character 255, entry = (54,255), do not recognize, record (54,255) is 260, output 54, suffix change prefix
Step 5: take 5th characters 24, entry = (258, 24), ah, meet you, isn't this old 258? So I set the string to and use it as the prefix
Step 6: Take the sixth character 255, entry = (258,255), do not recognize, record (258,255) is 261, output 258, suffix change prefix
.......
Until the last character is processed,
Use a table to record the processing process
Clear = 256, end = 257

 

.....
Some of the above examples cannot be fully reflected. Another example is:
The original input data is: a B a c d a c a B a B .....
The LZW algorithm is used to compress the data. The compression process is expressed as follows:
Note that the original data only contains four character, A, B, C, and D
It can be expressed in two bits. According to the LZW algorithm, first one bit is extended to the 2nd power + 1 = 4 with clear = 2; end = 4 + 1 = 5;
The initial label set is

 

The compression process is:

.....
When Step 1 is performed, the label set should be

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.