Understanding and using the Zlib library-my personal redemption

Source: Internet
Author: User
Tags crc32 crc32 checksum uncompress

Understanding and using the Zlib library
Author: Shu Rongwen
Date: 2016.6.2
0. Many years ago I wrote an article (http://blog.csdn.net/querw/article/details/1452041) to briefly introduce the use of zlib, honestly, I did not quite understand zlib is what happened, Now think of that time young, bold, thick-skinned ... I hope to commemorate ears's ignorance with a new article.
1. Deflate algorithm, zlib format, gzip format
This article is not an article about compression algorithms, please read the details of the LZ77 algorithm. Deflate is an enhanced version of LZ77 algorithm, which provides lossless compression for various data, and is the only compression algorithm currently implemented by Zlib.
A piece of data is compressed after the DEFLATE algorithm to form an output data, this output data is purely compressed data, no additional information such as length, checksum and so on. The raw compressed data can be stored directly or transmitted in the network and extracted by the inflate algorithm, but the user must ensure the integrity of the data.
Of course, we can also add an additional zlib format (rfc1950) header/tail for this piece of raw data, using the ADLER32 checksum, as defined below:
+---+---+
| Cmf|    Flg| (more-->)
+---+---+
(If FLG. Fdict set)
+-----+-----+-----+-----+
| Dictid | (more-->)
+-----+-----+-----+-----+
+========================+----+-----+-----+----+
|        ... compressed data ... | ADLER32 |
+========================+----+-----+-----+----+
2 bytes of zlib header, 4 byte dictionary (optional), deflate raw compressed data, 4 bytes of Adler32 checksum. This is a very concise format for data packaging.
Gzip (rfc1952) is a data header/tail that differs from zlib in another format, using the CRC32 checksum, defined as follows:
+---+---+---+---+---+---+---+---+---+---+
| id1| id2| CM |     Flg| MTIME | xfl| OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
(If FLG. Fextra set)
+---+---+=================================+
| Xlen | Xlen bytes of "extra field" ... | (more-->)
+---+---+=================================+
(If FLG. FNAME set)
+=========================================+
|...original file name, zero-terminated...| (more-->)
+=========================================+
(If FLG. Fcomment set)
+===================================+
|...file Comment, zero-terminated...| (more-->)
+===================================+
(If FLG. FHCRC set)
+---+---+
| CRC16 |
+---+---+
+=======================+
|...compressed blocks...| (more-->)
+=======================+
+---+---+---+---+---+---+---+---+
|     CRC32 | Isize |
+---+---+---+---+---+---+---+---+
An gzip paragraph consists of a 10-byte length header, several optional additional segments (identified by the Flag field in the GZIP header), compressed data, and a 4-byte CRC32 checksum, and a 4-byte source length.
The decompressor can know when the compressed data stream ends, so there is no need to include the compressed data length field in the data header, Isize equals the original length% 2^32 (Isize is the length of the original text for data less than 4GB).
Zip is also a packaging format, should be said to be a set of format conventions, mainly for multiple files to provide packaging capabilities, so the ZIP format has a lot of information about the file directory, the specific format can be searched on the Internet, and reference zlib source package in the Minizip project.
In the zlib document, the word "zlib" has two meanings (oddly named), one representing the zlib code base itself, and the "zlib" packaging format for deflate raw compressed data. In order to facilitate the distinction, I use "libzlib" to denote the former, with "Zlib "indicates the latter.
2. Libzlib Design Ideas-flow
The flow is defined as follows:
typedef struct Z_STREAM_S {
Z_const Bytef *next_in; /* Next Input BYTE */
UInt avail_in; /* Number of bytes available at Next_in */
ULong total_in; /* Total number of input bytes read so far */
Bytef *next_out; /* Next output byte should be put there */
UInt avail_out; /* remaining free space at Next_out */
ULong total_out; /* Total number of bytes output so far */
Z_const Char *msg; /* Last error message, NULL if no error */
struct internal_state far *state; /* not visible by applications */
Alloc_func Zalloc; /* Used to allocate the internal state */
Free_func Zfree; /* Used to free the internal state * *
VOIDPF opaque; /* Private Data object passed to Zalloc and Zfree */
int data_type; /* Best guess about the data type:binary or text */
ULong Adler; /* Adler32 value of the uncompressed data */
ULong reserved; /* Reserved for future use */
} Z_stream;
Consists of the input data xxxx_in and the output data xxxx_out, the raw data flows from the input, and the compressed data flows out from the output side (decompression in turn).
Programming, the user constantly "feed" the data to next_in and specify its length avail_in call the compression function, and then from Next_out to obtain the compressed data, the length is avail_out. This is the entire Zlib library interface design ideas.
Msg: Error message
Zalloc/zfree/opaque: Similar to the role of allocator in C + + STL, if you want to customize memory management you can write your own memory allocation collection function.
The ADLER:ADLER32/CRC32 checksum.
3. Libzlib interface
According to the previous description of the data packaging format can be known that the real compressed data is actually the same, calculated by the deflate algorithm, the difference lies in the different packaging format, so the subtleties of the Libzlib API details are how to configure the compressor/decompressor to obtain different packaging format output data.
3.1 Basic API
Zextern int Zexport Deflateinit of ((Z_streamp strm, int level)):
Initialize Z_stream, if you want to use the default memory management function, you must set Zalloc/zfree/opaque to z_null. Outputs a compressed stream with Zlib data header/tail.
The resulting compressor initialized with this function will output the compressed data in zlib format by default. What if we want to get the gzip format or raw compressed data? This leads to another compressor initialization function that provides more options:
Zextern int Zexport deflateInit2 of (Z_streamp STRM,
int level,
int method,
int Windowbits,
int Memlevel,
int strategy)):
The data output in different packaging formats is controlled by the Windowbits parameter:
8 ~ 15: Output zlib Data Header/tail, deflateinit () This parameter value is fixed to 15, which is the value of max_wbits defined in zconf.h.
-8 ~-15: The output raw compressed data does not contain any header/tail. If there are no special requirements, use-15 can be used to indicate the internal use of 32K LZ77 sliding window.
24 ~ 31: Output gzip Format data, default provides a all settings are zeroed data header, if you want to customize this data header, you can after initialization, deflate () before calling Deflatesetheader ().
Level: Compression levels 0 ~ 9. 0 means no compression, 1 is the fastest, and 9 indicates the highest compression ratio. Z_default_compression (-1) indicates that the default setting is used.
Method:z_deflated (8) is only the only supported compression algorithm.
Memlevel: Control the size of internal memory used in Libzlib, 1 ~ 9 The smaller the memory usage, the less time spent. The default value is 8.
Strategy: The encoding strategy of the internal compression algorithm, if there is no special requirements, set to Z_default_strategy (if you have special requirements, then you naturally know the rest of the options Z_filtered/z_huffman_only/z_rle/z_ What do you mean by FIXED?).
Zextern int Zexport deflate of ((Z_streamp strm, int flush)):
Flush: If there is no special requirement, we can first call deflate () with flush = Z_no_flush, and after the input data compression is complete, we also need to call with flush = z_finish and confirm that deflate () returns the Z_stream_end table Shows that all data has been written to the output buffer, and a stream ends. If you enter all the original text at once, you can also call deflate () directly with flush = Z_finish, which is exactly what compress () does.
The user compresses the data and updates the Next_out/avail_out by setting the input next_in/avail_in specified in the Z_stream. The input and output buffers are assigned by the user. Let's take an example: the input buffer is byte inbuf[1024] then next_in = inbuf, avail_in = 1024. Because it is not possible for a user to know the compressed data length until compression is complete, the output buffer cannot be allocated exactly (unless the deflatebound () calculation is called). The user can assign an output buffer of any length (greater than 6), such as Byte outbuf[128], then Next_ out = outbuf, avail_out = 128. Next call deflate, and then check that avail_in represents the length of data that has not been processed in the input buffer, in other words the length of the data being processed is 1024-avail_in. avail_out represents the remaining space of the output buffer, 128-avail_ou T is the length of the compressed data this time, as long as avail_in! = 0 is reset avail_out continue compression, once avail_in = = 0 indicates that the data has been submitted, and then z_finish called deflate (STRM, Z_fini SH) indicates the compressor, the data has been submitted, please output zlib or gzip data tail, if deflate return to z_stream_end means that the end of the data has been output, the work is completed. Even if you configure the compressor to output raw compressed data without using the wrapper format, We will also follow this process to invoke deflate to ensure that the output data is complete.
Zextern int Zexport deflateend of (Z_streamp strm):
Release Z_stream.
Zextern ULong zexport Deflatebound of (Z_streamp strm, ULong Sourcelen);
Calculates the length of the compressed data, which can be called to estimate the maximum length of the output buffer if a memory buffer needs to be compressed once.
Zextern int Zexport Deflatesetheader of (Z_streamp Strm, Gz_headerp head);
Set the custom gzip header, which should be called after deflateinit/deflateinit2, before deflate.
Zextern int Zexport Inflateinit of (Z_streamp strm):
Similar to Deflateinit, the default parameter is used to dissolve the compressor. The zlib or gzip header is discarded if it is necessary to retain the header information after InflateInit2 (), inflate () before calling Inflategetheader () to provide a GZIP header structure struct gz_header, once Libzlib read to a full gzip header will fill in the information into the structure, inflate () back, check the gz_header structure of the Done field, 1 means that the data header read finished; 0 means decompression is being extracted; 1 means no gzip header, using this function for a compressed stream in a zlib format will get-1.
Zextern int Zexport Inflate of ((Z_streamp strm, int flush)):
Decompression, like the deflate call flow, should end with the parameter flush = Z_finish call infate, return z_stream_end means the decompression is complete, and the checksum matches.
Zextern int Zexport inflateend of (Z_streamp strm):
Release Z_stream.
Zextern int Zexport inflateInit2 of ((Z_streamp strm, int windowbits)):
Corresponds to DeflateInit2, usually with the same windowbits value. windowbits + 32 allows the decompressor to automatically recognize zlib or gzip packaging formats.
Libzlib also provides api:inflatebackinit/inflateback/inflatebackend that handle decompression as a callback.
3.2 Tool Functions
Compress/compress2/compressbound/uncompress is a combination of basic APIs and a template for how to invoke the basic API, we should read COMPRESS.C and UNCOMPR.C carefully.
The compressed data length can be estimated using compressbound (), but there is no way to estimate the length of the extracted text, so the user should get the original length through other channels and allocate enough buffers to call uncompress ().
3.3 Other APIs
Reading my article is not directly through copy and paste to write code, but should be able to understand the use of libzlib (at least I want to achieve this goal), not only know what functions to invoke, but also understand why. You should write your code to see the libzlib documentation.
4. Compiling under Windows platform
Since it is a free library, we still download zlib source code to compile itself, do not use the compiled DLL library, access to http://www.zlib.net/download ". zip" format of the source package.
Open "README" and see "for Windows, use one of the special makefiles in win32/or contrib/vstudio/." Switch to contrib/vstudio/directory, and Now a Readme.txt is a few details about different versions of VS, open the project file for your own installed VS version (it is necessary to read the Readme patiently and take a lot less detours).
Method 1 uses the Visual Studio IDE: Because I already have Visual Studio 2013 installed, I open/contrib/vstudio/vc11/zlibvc.sln directly with VS2013 (this is actually VS2012 's project file) . Compile "ZLIBVC" This is the most basic dynamic library DLL project, prompting 3 link errors:
1>match686.obj:error lnk2026:module unsafe for SAFESEH image.
1>inffas32.obj:error lnk2026:module unsafe for SAFESEH image.
1>.\zlibvc.def (4): Fatal error Lnk1118:syntax error in ' VERSION ' statement
First look at the LNK1118 error: StackOverflow (http://stackoverflow.com/questions/20021950/ DEF-FILE-SYNTAX-ERROR-IN-VISUAL-STUDIO-2012) See that the syntax for version definition in. DEF is changed (in fact, fixed) only two numbers are allowed: The major version number and the secondary version number. So either put " Version 1.2.8 "Change to a two-digit release number, or create a resource of type version." Actually, the resources of version 1.2.8 are already included in the project, so we simply comment out the version in the Zlibvc.def.
Look again LNK2026 error: SAFESEH literal understanding should be safe SEH-Secure structured exception handling, SEH is the Windows platform's structured exception handling mechanism, by extending the C compiler __try, __finally keyword to control the process &lt <windows Core programming >> has related content introduction. Libzlib probably won't use SEH. Maybe it's because VS2013 changed the default setting for this option, for what reason is incompatible I don't know. In short, close the SAFESEH: Project Properties, Linker, Advanced I Mage has Safe Exception handlers to NO, recompile, found Testzlib also have the same problem, shut down SAFESEH compile again good.
Library files: zlibwapi.lib, Zlibwapi.dll, zlibstat.lib (Static library)
Header files: zconf.h, zlib.h
It is unreasonable for Microsoft to update a version to make it impossible for the old project to compile links.
Method 2 using NMAKE, copy the Win32/makefile.msc to the previous source directory, start "Developer command Prompt for VS2013" (in the Start menu), switch to zlib 1.2.8 source with the CD command Enter "nmake/f makefile.msc" to complete the compilation.
Library files: zdll.lib, Zlib1.dll, zlib.lib (Static library)
Header files: zconf.h, zlib.h
5. Demo
#include <stdio.h> #include <string.h> #include <assert.h>extern "C" {#include "zlib.h"} #pragma Comment (lib, "Zlib.lib") int dump_buffer (const bytef* buf, size_t len) {for (size_t i = 0; i < len; ++i) {printf ("%02x", BU F[i]);} return 0;} int _tmain (int argc, _tchar* argv[]) {Const char* INBUF = "1234,abcd,abcd,^#@!."; Bytef outbuf[1024] = {0}; Bytef restorebuf[1024] = {0};int Outlen = 0;int Restorelen = 0;int err = 0;z_stream Stream;int FMT = 2; 0:zlib; 1:gzip; 2:rawprintf ("Source string:%s\r\n", inbuf);//Compression stream.next_in = (z_const Bytef *) inbuf;stream.avail_in = (uInt) strlen ( INBUF) Stream.next_out = (BYTEF *) outbuf;stream.avail_out = 1024;stream.zalloc = (alloc_func) 0;stream.zfree = (free_ Func) 0;stream.opaque = (VOIDPF) 0;if (0 = = FMT) {//Zliberr = Deflateinit (&stream, z_default_compression); Assert (Z_OK = = Err); err = deflate (&stream, z_finish); assert (err = = z_stream_end); Outlen = Stream.total_out;err = Deflateend ( &stream);p rintf ("zlib string (HEX):");} else if(1 = = FMT) {//Gziperr = DeflateInit2 (&stream, Z_default_compression, z_deflated, Max_wbits + 8, z_default_strategy); assert (Z_OK = = err); err = deflate (&stream, z_finish); assert (err = = z_stream_end); Outlen = Stream.total_out;err = Deflateend (&stream);p rintf ("gzip string (HEX):"); else if (2 = = FMT) {//Rawerr = DeflateInit2 (&stream, Z_default_compression, z_deflated, Max_wbits *-1, 8, Z_default_st Rategy); assert (Z_OK = = err); err = deflate (&stream, z_finish); assert (err = z_stream_end); outlen = Stream.total_out; Err = Deflateend (&stream);p rintf ("Raw deflate String (HEX):");} Else{assert (0);} Dump_buffer (Outbuf, Outlen);p rintf ("\ r \ n");//Unzip stream.next_in = (z_const Bytef *) outbuf;stream.avail_in = (uInt) Outlen;stream.next_out = (BYTEF *) restorebuf;stream.avail_out = 1024;stream.zalloc = (alloc_func) 0;stream.zfree = (free _func) 0;stream.opaque = (VOIDPF) 0;if (0 = = FMT) {//Zliberr = Inflateinit (&stream); assert (Z_OK = = err); err = Inflate (&A Mp;stream, Z_finish); Assert (Err = = Z_stream_end); Restorelen = Stream.total_out;err = Inflateend (&stream);} else if (1 = = FMT) {//Gziperr = InflateInit2 (&stream, max_wbits + +); assert (Z_OK = = err); err = Inflate (&stream, z_ FINISH); assert (err = = z_stream_end); Restorelen = Stream.total_out;err = Inflateend (&stream);} else if (2 = = FMT) {//Rawerr = InflateInit2 (&stream, Max_wbits *-1); assert (Z_OK = = err); err = Inflate (&stream, Z_f Inish); assert (err = = z_stream_end); Restorelen = Stream.total_out;err = Inflateend (&stream);} Else{assert (0);} printf ("Restored string:%s\r\n", (char*) restorebuf);p rintf ("Press Enter to continue ..."); GetChar (); return err;}


The FMT is set to run results of 0, 1, 2 o'clock respectively:
SOURCE string:1234,abcd,abcd,^#@!.
Zlib string (HEX): 789c33343236d1494c4a4ed171747276d189537650d40300357804f3
Restored string:1234,abcd,abcd,^#@!.
SOURCE string:1234,abcd,abcd,^#@!.
Gzip String (HEX): 1f8b080000000000000b33343236d1494c4a4ed171747276d189537650d4030065d6b0c314000000
Restored string:1234,abcd,abcd,^#@!.
SOURCE string:1234,abcd,abcd,^#@!.
Raw deflate String (HEX): 33343236d1494c4a4ed171747276d189537650d40300
Restored string:1234,abcd,abcd,^#@!.
You can see that the compressed data in the middle is the same, but the tail is different.

Understanding and using the Zlib library-my personal redemption

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.