The compression/decompression requirements in the project should be many, such as the typical consideration of network transmission delay and data compression transmission, or other types of space storage requirements. This time also encountered a similar demand, in doing a crawler, because the crawl project has not yet been determined, so consider the entire HTML page compression stored in the database, so it is a variety of Google, and finally did not accidentally Google to Google's snappy:-) Brief Introduction
Snappy:google developed a very popular compression algorithm designed to provide a relatively superior compression algorithm for both speed and compression ratios. Although it is only a data compression library, it is used by Google for many internal projects, including Bigtable,mapreduce and RPC. Google claims that it has optimized the data processing speed of the library itself and its algorithms, as a price, without considering the output size and compatibility issues with other similar tools. The snappy is optimized for a 64-bit x86 processor and can achieve at least 250MB compression per second and 500MB decompression per second on the core of a single Intel Core i7 processor.
Summary copy come over this heap: snappy is not the highest compression rate, but the speed and performance is excellent, over. Use
In fact, there are many compression algorithms, and compression rate and speed than snappy high, there are a typical example: LZ4, commonly used compression decompression algorithm comparison see here.
As for why I chose snappy, just two words: simple. Ha ha..
Compared to already very simple Lz4,snappy API simpler, and the key in the decompression link, snappy use intuitive, do not need to consider the size of compressed packets or compressed before the size, in some links more worry, of course, no assessment of the fine place, so also do not do too much guidance of the description, Or should be selected according to the specific project requirements.
Specific use:
public static byte[] compresshtml (String html) {
try {return
snappy.compress (html.getbytes ("UTF-8"));
catch (IOException e) {
e.printstacktrace ();
return null;
}
}
public static String decompresshtml (byte[] bytes) {
try {return
new String (snappy.uncompress (bytes));
catch (IOException e) {
e.printstacktrace ();
return null;
}
}
Ya seems too simple choke, is a line, it's just, we still come here to see their own API bar, especially the file stream related operations have, but also a hydrological, thank you click.
add : Measured snappy and LZ4: Compressed string length of about 50000 after compression:
Snappy about 19000 lz4 about 16000 lz4 fast about 19000
The end!