Overview and mutual conversion of BMP, GIF, and JPEG file formats

Last Update:2018-12-04 Source: Internet

Author: User

Tags 0xc0 bmp image

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview and mutual conversion of BMP, GIF, and JPEG file formats by Jiang Ping released at: 2000/11/27

Abstr:

This article mainly discusses the bitmap compression method.

Body:

Overview and mutual conversion of BMP, GIF, and JPEG file formats

An image file is a computer disk file that depicts an image. After digital image data is formed, there are two methods to store it in a computer: bitmap and vector processing.
Here we mainly discuss bitmap. Different image software processes images in almost all kinds of ways. There are also a variety of image formats, including file recognition headers and image data. The file identification header is used by the computer to determine the file format. The image data contains the entire image profile data, including the color palette and bitmap image. The image method varies depending on the compression algorithm. The following describes the compression algorithm.

1. Stroke Length Compression
The principle is to replace adjacent pixels with the same color value in a scan row with a Count value and the color value of those pixels. For example, aaabccccddeee can be replaced by 3a1b6c2d3e. RLE compression is very effective for images with a large area and the same color area. Many specific compression methods are derived from the RLE principle:
1. PCX stroke compression method: this algorithm is actually a conversion algorithm from the bit ing format to the compression format. This algorithm is used for the byte ch that appears once in a row, if ch> 0xc0, 0xc1 is added before the byte during compression; otherwise, CH is output directly. For the byte ch that appears n times in a row, it is compressed into 0xc0 + N, Ch, therefore, n can only be the maximum ff-c0 = 3fh (63 in decimal format), when n is greater than 63, it needs to be compressed multiple times.
2. bi_rle8 compression method: This compression method is used in Windows bitmap files. The compression method also uses two bytes as the basic unit. The first byte specifies the number of color repetitions specified by the second byte. For example, encoding 0504 indicates that five pixels with a color value of 04 are displayed consecutively from the current position. When the second byte is zero, the second byte has a special meaning: 0 indicates the end of the row; 1 indicates the end of the graph; 2 escapes the next two bytes, these two bytes indicate the horizontal and vertical displacement of the next pixel relative to the current position. This compression method compresses the maximum number of pixels of an image to 8 bits (256 colors.
3. bi_rle compression method: This method is also used in Windows bitmap files. It is similar to bi_rle8 encoding. The only difference is that one byte of bi_rle4 contains two pixel colors. Therefore, it can only compress images of up to 16 colors. Therefore, this compression has limited application scope.
4. packbits: This method is used to compress bitmap data on Apple's Macintosh machine. This method is used in the tiff specification, which is similar to the bi_rle8 compression method, for example, 1c1c1c2132325648 is compressed to 83 1C 21 81 32 56 48. Obviously, this compression method is best to be the same for every 128 bytes in a row. The 128 bytes can be compressed into a value of 7f. This method is very effective.

Ii. Zip code compression:
It is also a common compression method. It was created for text files in 1952. Its basic principle is that frequently used data is replaced by short code, and rarely used data is replaced by long code, the code for each data is different. These codes are binary codes and the code length is variable. For example, if there is a sequence of raw data, abaccdaa is encoded as a (0), B (10), C (110), (d111), and compressed as 010011011011100. To generate the Hoffmann encoding, You need to scan the original data twice. The first scan requires accurate statistics on the frequency of each value in the original data, and the second scan requires the establishment of the Hoffmann tree for encoding, because Binary Trees need to be built and traversed to generate codes, data compression and restoration are slow, but simple and effective, and thus widely used.

Iii. LZW Compression Method
LZW compression technology is more complex than most other compression technologies, and the compression efficiency is also high. The basic principle is to encode each character string that appears for the first time with a value, and then convert the value into the original character string in the restoration program, for example, if the value 0 x is used to replace the string "abccddeee", 0x100 is used to compress the string whenever it appears. As for the ing between 0x100 and string, it is dynamically generated during the compression process, and this ing is hidden in the compressed data, with the decompression, this encoding table will be gradually restored from the compressed data, and the subsequent compressed data will generate more mappings Based on the correspondence between the preceding data. Until the compressed file ends. LZW is reversible and all information is retained.

Iv. Arithmetic Compression Method
Arithmetic compression is similar to Hoffmann encoding compression, but it is more effective than Hoffmann encoding. Arithmetic compression is suitable for files composed of the same recurring series. Arithmetic compression is close to the theoretical limit of compression. In this method, different sequences are mapped to an area between 0 and 1, which is represented as binary decimal places with variable precision (digits, the less common the data, the higher the accuracy (more digits). This method is complex and therefore not commonly used.

V. JPEG (Joint photography expert group joint photographic exprerts Group)
The JPEG standard is different from other standards. It defines incompatible encoding methods. In its most common mode, it is distorted, an image recovered from a jpeg file is always different from the original image, but the lossy compressed and reconstructed image is often better than the original image. Another notable feature of JPEG is that its compression ratio is quite high. Compared with the compressed image size, the original image size can range from 1% to 80 ~ 90%. This method works well and is suitable for multimedia systems.

After introducing the compression algorithm, we will briefly introduce the differences between the three bitmap formats and the mutual conversion between them.
　　1. BMP Image
· Bitmapheader Data Structure
· Bitmapinfo Data Structure
· Bitmap Array

1) the data structure of the bitmap file header contains information such as the type and content of the BMP image file.
Typedef struct {
Int bftype; // must be "BM"
Long bfsize; // bitmap size
Int bfreserved1; // must be "0"
Int bfreserved2; // must be "0"
Long bfoffbits; // the starting position of the bitmap Array
} Bitmapefileheader;

2) The bitmap information data structure consists of two data structures: bitmapinfoheader and rgbquad,
Typedef struct {
Bitmapinfoheader bmiheader;
Rgbquad bmicolors [];
} Bitmapinfo

The bitmapinfoheader data structure contains information about the width, height, and compression methods of BMP images.
The data structure rgbquad defines a color.

3) bitmap Array
The bitmap Array records each pixel value of an image. Scan the image row by row from the lower left corner of the image. From left to right, from top to bottom, the pixel values of the image are recorded one by one. The bytes of these pixel values constitute a bitmap array.
The storage formats of Bitmap array data are compressed and non-compressed.
1. The pixel value of each vertex in a non-compressed bitmap corresponds to several bits of the bitmap array, and several bits of the bitmap array are determined by the height, width, and color number of the image.
2. Compression formats in BMP format files, Windows supports both BI-RLE8 and BI-RLE4 compression types of storage formats.

　　2. GIF Image File Format
The full name of GIF is Graphics Interchange Format. GIF is a common image file format standard, but it is copyrighted by CompuServe.
The GIF file structure contains a file header,
　　　　　　　　　　　　

In a GIF file, the first thing you encounter is the GIF mark, which tells the decoder that this is a GIF file. This flag is a 3-byte string: GIF. A GIF file can store multiple images, but most files only contain one image.
Then, the screen descriptor illustrates the display resolution of the images in the generated display file, indicating the width and height of the screen respectively.
The next byte is the global identifier. the lower three digits indicate the color of the image to be encountered. The highest bit indicates whether a global color table exists.
The background color indicates that the background is set to a proper color, which is actually a number pointing to the global color table.
Struct global_data {
Unsigned short screen_width;
Unsigned short screen_height;
Unsigned char background;
Har tail = '/0 ';
}
The next step is the global color table, which stores all the seriousness in order. Each seriousness is described by one of the color tables, each of which is 3 bytes, represents the intensity of the three primary colors: Red, green, and blue. Its length is represented by the lower three digits of the global sign.
Later data will be local. Is a collection of data blocks. The following is the structure of the image data block.
Struct local_head {
Char heading = ',';
Unsigned short image_left; // start position displayed on the image screen
Unsigned short image_top;
Unsigned short image_width;
Unsigned short image-height;
Unsigned char local_flag; // local flag
}

The difference between a local logo and a global logo is that the secondary high. If this parameter is set to 1, the bitmap data of the image is stored in a line-by-line manner. That is to say, in the Undo bitmap data, the first row stores the first row on the screen, the second row corresponds to the second row on the screen, and the third row corresponds to the second row on the screen, this increments-this is the first scan; the second scan starts from 5th rows on the screen, and the second scan increments by 8; the third scan starts from 3rd rows on the screen, the number of rows is increased by 4, and the number of rows is increased by 2, starting from 2nd. The relationship between the image data stored in the same row and the image data stored in the same sequence (not in the same row) is shown in:
　　

The GIF images stored in different rows can be scanned four times during side decoding and display. Although the first scan only shows 1/8 of the entire image, the second scan only shows 1/4, But this shows the overall image. When a GIF image is displayed, the image stored on the interlace will give you the impression that the display speed to the image seems to be faster than other images, which is an advantage of the interlace storage.
LZW compression algorithm is used in the codec of GIF images to convert these streams into another form of code stream. The decoding process is to restore the code stream to the original one.

　　3. JPEG Image File Format
JPEG is the abbreviation of joint photographic Experts Group (Joint photography expert group. JPEG is mainly used for the standard encoding technology of digital images. JPEG image files are pixel format files, but they are much more complex than image files such as GIF and BMP. Fortunately, when using a jpeg library, we only need to have a general understanding of the file format. There is no need to have a detailed understanding of the JPEG file format.
JPEG is a Lossy encoding format. However, if a GIF file is worse than a decoded reconstructed image, it is closer to the original image than a GIF image. JPEG encoding is implemented by color conversion, DCT conversion, quantization, and encoding. The compression format is determined by the Library of Version 4.0 of the famous JPEG group.

The conversion between the three image formats is mainly in C language, C ++, and assembly language, because these languages can directly perform underlying operations, extract the image and compress it in another format.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More