First of all, I apologize: Yesterday I wrote an article that suddenly turned off the lights, so that I wrote a 80% article that had to be wiped out. My poor computer was also shut down once. There is no way to do this. The school has school rules, so I had to write it again from the beginning today.
Let's get down to the truth.
Before writing an article, I assume that you are clear about what GIF is. If you are not clear about it, I suggest you search for "GIF file Introduction" on the Internet.
As a network-oriented image format, GIF adopts the principle of "Easy to use on the network. The first principle used by the network is the file size. How to use an effective compression algorithm to reduce the file size is the primary problem to be solved for GIF files. To solve this problem, GIF adopts the LZW compression algorithm with excellent compression efficiency. The compression efficiency and compression ratio of this algorithm are extremely high, but it is not good: It is copyrighted.
Therefore, my first part is to introduce the LZW algorithm, which has nothing to do with the content of the GIF file format, but it is only an algorithm, but it is the basis of the LZW-GIF compression algorithm (GIF compression algorithm used.
1. Overview
The basic idea of the LZW algorithm is very simple: there must be a lot of repeated strings in an input uncompressed byte stream, so this string is represented in one byte, does it play the role of compression?
Next, let's take a look at several concepts. These concepts will appear repeatedly in the introduction below, so remember them.
Char stream (unencoded byte stream): As the output of encoding input and decoding, so that the uncompressed byte stream is the object we want to process.
Code stream (encoded stream): I did not write a byte stream, because it does not necessarily output bytes. In fact, most of the time it is not a byte.
String table (encoding table): Used as the dictionary for encoding and decoding. Its format is an index that corresponds to a string. during initialization, the index starting from 0 corresponds to the corresponding characters in the character set. The major advantage of LZW's encoding table is that it does not need a large storage space and is dynamically generated during encoding and decoding, to generate a variable: the number of initial root items.
Root item (encoding table root item): items with only one character initially generated during the encoding table. They have a series of features that I will elaborate on when describing the algorithm.
Now, the concept is complete. Now we will introduce the encoding algorithm:
2. Encoding Algorithm
The first is the pseudo-code description of the algorithm:
Initial_string_table (root_number );
Current_prefix = ''; // empty string
While (a char C can be retrieved from Char stream ){
Current_string = current_prefix + C;
If (current_string exists in string_table ){
Current_prefix = current_string;
} Else {
Add current_string to the end of string_table;
Output the index corresponding to current_prefix in string_table to code stream;
Current_prefix = C;
}
}
Output the index corresponding to current_prefix in string_table to code stream;
// Encoding ends
Now let's take a look at how each step is done:
Step 1: initialize our encoding table. As mentioned above, only one parameter is required for initializing the encoding table: root_number, that is, the number of root items. In fact, this is the number of character sets in the char stream to be encoded. Initialization is also very simple: Sort all the char according to a previous Convention, and then compile it into an index from scratch to form an item, sort these items from small to large to form an initial encoding table.
Step 2: Initialize some local variables: current_prefix and current_string, and set them to empty strings.
Step 3: Start circular encoding by using the following method:
Read a char C from the char stream and connect it to the current_prefix to form current_string.
Search in string_table to see if there is any current_string.
Yes: It indicates that the current_string is "old", so it does nothing. It only assigns current_string to current_prefix;
None: This current_string is "new", so the first step is to change it to "old", that is, to insert it to the end of string_table; then the index of current_prefix in string_table is output to code stream. Finally, repeat the string to assign C to current_prefix.
In this case, the loop ends, and the next prefix will be added, and its index will be output.
For example.
Assume that our code stream is bbbcbba and the character set is {a, B, c}. We agree that the arrangement is in ASCII ascending order.
The initialized string_table is as follows:
-----------------------
Index string
00 'A'
01 'B'
02 'C'
-----------------------
These three items are called root items.
Then start encoding: During the simple process, I will only list the values of each quantity and give a brief description.
Step prefix char C currentstring exist? Operation code stream (out)
01 [] B [] B = B YES NO x
02 [B] B [B] B = BB no insert BB index (B) = 01
03 [B] B [B] B = BB YES NO x
04 [BB] C [BB] C = BBC no insert BBC index (bb) = 03
05 [c] B [c] B = CB no insert CB index (c) = 02
06 [B] B [B] B = BB YES NO x
07 [BB] a [BB] A = BBA no insert BBA index (bb) = 03
08 [a] No
Output [a]'s index: Index (A) = 00
The encoding is complete. The code stream is: 01 03 02 03 00.
The final encoding table is as follows:
-------------------------
Index string
00 'A'
01 'B'
02 'C'
03 'bb'
04 'BBC'
05 'cb'
06 'bba'
-------------------------
The encoding compression ratio depends on the number of duplicate strings.
Encoding is mentioned here. The following is decoding.
(To be continue ...)