1. stroke length compression
the principle is to replace adjacent pixels with the same color value in a scan row with a Count value and the color values of those pixels. For example, aaabccccddeee can be replaced by 3a1b6c2d3e. RLE compression is very effective for images with a large area and the same color area. Many specific compression methods are derived from the RLE principle:
1. PCX stroke compression method: The algorithm is actually a conversion algorithm from bit ing format to compression format, this algorithm adds 0xc1 before a byte ch that appears once in a row if ch> 0xc0, otherwise it directly outputs Ch. For the byte ch that appears n times in a row, then compressed into 0xc0 + N, CH these two bytes, so n maximum can only be ff-c0 = 3fh (decimal 63), when n is greater than 63, it needs to be compressed multiple times.
2. bi_rle8 compression method: This compression method is used in Windows bitmap files. The compression method also uses two bytes as the basic unit. The first byte specifies the number of color repetitions specified by the second byte. For example, encoding 0504 indicates that five pixels with a color value of 04 are displayed consecutively from the current position. When the second byte is zero, the second byte has a special meaning: 0 indicates the end of the row; 1 indicates the end of the graph; 2 escapes the next two bytes, these two bytes indicate the horizontal and vertical displacement of the next pixel relative to the current position. This compression method compresses the maximum number of pixels of an image to 8 bits (256 colors.
3. bi_rle compression method: This method is also used in Windows bitmap files. It is similar to bi_rle8 encoding. The only difference is that one byte of bi_rle4 contains two pixel colors. Therefore, it can only compress images of up to 16 colors. Therefore, this compression has limited application scope.
4. packbits: This method is used to compress bitmap data on Apple's Macintosh machine. This method is used in the tiff specification, which is similar to the bi_rle8 compression method, for example, 1c1c1c2132325648 is compressed to 83 1C 21 81 32 56 48. Obviously, this compression method is best to be the same for every 128 bytes in a row. The 128 bytes can be compressed into a value of 7f. This method is very effective.
2. Hoffman encoding compression:
it is also a common compression method. It was created for text files in 1952. Its basic principle is that the frequently used data is replaced by a short Code , rarely used data is replaced by long code, and the code of each data is different. These codes are binary codes and the code length is variable. For example, if there is a sequence of raw data, abaccdaa is encoded as a (0), B (10), C (110), (d111), and compressed as 010011011011100. To generate the Hoffmann encoding, You need to scan the original data twice. The first scan requires accurate statistics on the frequency of each value in the original data, and the second scan requires the establishment of the Hoffmann tree for encoding, because Binary Trees need to be built and traversed to generate codes, data compression and restoration are slow, but simple and effective, and thus widely used.
the best method of lossless compression is the Harman encoding. It uses a pre-binary description to replace each symbol. The length is determined by the frequency at which the special symbol appears. Common symbols need to be expressed in a few bits, while many of the uncommon symbols need to be expressed.
The Harman algorithm is the best in terms of modifying any symbolic binary code to cause a small amount of intensive performance. However, it does not process the order of symbols and the sequence of duplicates or serial numbers.
Iii. LZW Compression Method
LZW compression technology is more complex than most other compression technologies, and the compression efficiency is also high. The basic principle is to encode each string that appears for the first time with a numerical value.ProgramAnd then convert the value into the original character string. For example, replace the string "abccddeee" with the value 0 x. In this way, 0x100 is used whenever the character string appears, which plays the role of compression. As for the ing between 0x100 and string, it is dynamically generated during the compression process, and this ing is hidden in the compressed data, with the decompression, this encoding table will be gradually restored from the compressed data, and the subsequent compressed data will generate more mappings Based on the correspondence between the preceding data. Until the compressed file ends. LZW is reversible and all information is retained.
Lossless compression encoding. This encoding is mainly used to compress image data (such as GIF ). The signal source with simple images and smooth and low noise has a high compression ratio and a high compression and decompression speed.
LZW compression technology uses simple code to represent complex data in the data stream, and establishes a conversion table based on the correspondence between code and data, also known as "string table ".
A conversion table is a dynamically generated table during the compression or decompression process. This table is only required during the compression or decompression process. Once the compression and decompression are complete, this table will no longer function.
Iv. Arithmetic Compression Method
Arithmetic compression is similar to Hoffmann encoding compression, but it is more effective than Hoffmann encoding. Arithmetic compression is suitable for files composed of the same recurring series. Arithmetic compression is close to the theoretical limit of compression. In this method, different sequences are mapped to an area between 0 and 1, which is represented as binary decimal places with variable precision (digits, the less common the data, the higher the accuracy (more digits). This method is complex and therefore not commonly used.
V. Rice
For data consisting of large words (such as 16 or 32 bits) and low-level data values, rice encoding can achieve a better compression ratio. Both audio and highly dynamically changing images are of this type of data, and they have been pre-processed by certain predictions (such as adjacent Delta sampling ).
Although he/she coding is optimal in processing such data, he/she is not suitable for processing such data for several reasons (for example, the 32-bit size requires a 16 GB bar chart buffer to be used for his/her tree encoding ). Therefore, a dynamic method is more suitable for data composed of big words.
The basic idea behind rice encoding is to store as few characters as possible (just like the Hamman encoding ). In fact, some people may think that rice is a static husky Code (for example, encoding is not determined by the actual data content statistics, but by the common assumption of a small value ratio ).
The encoding is very simple: the value x is expressed by X '1' followed by a 0 bit.
Vi. Lempel-Ziv (lz77)
The Lempel-Ziv compression mode has many different variables. The basic compression library has a clear implementation of the lz77 algorithm (Lempel-Ziv, 1977) and runs well,Source codeIt is easy to understand.
The LZ encoder can be used to compress common targets, especially for text execution. It is also used in RLE and Harman encoder (RLE, LZ, and Harman) to obtain more compression in most cases.
Behind the LZ compression algorithm is the use of the RLE Algorithm to replace the previous reference of the same byte sequence.
In short, the LZ algorithm is considered a string matching algorithm. For example, a character string often appears in a piece of text and can be expressed by a string pointer in the previous text. Of course, the premise of this idea is that the pointer should be shorter than the string itself.
For example, the phrase "string" appears frequently in the previous section, which can be expressed by referencing the first string except the first string, thus saving some space.
A string reference is represented in the following way:
1. Unique tag
2. offset quantity
3. String Length
The reference is a fixed or variable length determined by the encoding mode. The latter case is often preferred, because it allows the encoder to swap the size of a string with the referenced size (for example, increasing the length of a reference may be worthwhile if the string is quite long ).
VII,DeflateThe lz77 algorithm andUser ID(Huffman coding) is a lossless data compression algorithm. It was originally named by Phil KatzPKZIPThe second version of archive tool is defined later inRFC 1951Specification.
It is widely believed that deflate is not subject to any patents, and before LZW (used in GIF File Format)-related patents expire, in addition to being used in the ZIP file format, this format is also applied to gzip compressed files and PNG image files.
The source code for deflate compression and decompression can be found in the free and common compression library zlib.
Deflate with a higher compression ratio is7-zip.AdvancecompThis implementation can also be used to compress gzip, PNG, MNG, and zip files to get a smaller file size than zlib. The kzip and pngout of Ken Silverman use a deflate program that is more efficient and requires more user input.
PS: see Wikipedia http://zh.wikipedia.org/zh/DEFLATE.
| Data Compression Method |
|
| Lossless Data Compression |
| Theory |
Entropy·Complexity·Information Redundancy·Lossy Data Compression |
|
| Entropy Embedding Method |
Shannon-Fano·Shannon-Fano-Elias·Harman tree·Arithmetic Coding·Range·Golomb·EXP-golomb·Unified encoding (Elias)·Fibonacci) |
|
| dictionary |
Rle·Lz77/78·Lzss·LZW·Lzwl·Lzo· Deflate·Lzma·LZX·Lzjb |
|
| Others |
CTW·BWT·Ppm·DMC |
|
|
| Audio |
| Theory |
Convolution·Sampling·Sampling theorem |
|
| Audio and audio decoding tool |
LPC·Wlpc·CELP·ACELP·A-Law·μ-Law·Mdct·Fourier transform·Audio Psychology |
|
| Others |
Dynamic range compression·Speech Coding·Child band Encoding |
|
|
| Image |
| Condition |
Blank color·Pixels·Color sampling·Compression artifact |
|
| Method |
Rle·Fractal Compression·Wavelet Compression·SPIHT·DCT·Klt |
|
| Others |
Bit Rate·Standard test image·Peak Signal-to-Noise Ratio |
|
|
| Video |
| Condition |
Zookeeper features·Frame·Classification type·Parameter Quality |
|
| Video codecs |
Motion Compensation·DCT·Quantization |
|
| Others |
CBR Theory·ABR·VBR) |
|