In the process of crawling the page, in the store crawl to the content of the page I need to first compress and then store the page, in order to use the convenience, using 2.0 GZipStream to compress.
The reference is as follows:
using System.IO;
using System.IO.Compression;
......
public static byte[] Compress(byte[] data)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(stream, CompressionMode.Compress);
gZipStream.Write(data, 0, data.Length);
//....暂时先在注释的位置卖点关子
return stream.ToArray();
}
public static byte[] Decompress(byte[] data)
{
MemoryStream stream = new MemoryStream();
GZipStream gZipStream = new GZipStream(new MemoryStream(data), CompressionMode.Decompress);
byte[] bytes = new byte[4096];
int n;
while ((n = gZipStream.Read(bytes, 0, bytes.Length)) != 0)
{
stream.Write(bytes, 0, n);
}
return stream.ToArray();
}
The above code seems to have no problem (if you do not carefully do the test), but when I test the various sizes of the page, found that if the compression after the byte[] length <4k, then the problem comes out: can't decompress, decompression function in read return result is always 0. Mr. Yiduo once said in his speech "depressed ah, depressed, this is a group of depressed, it is the glory of Microsoft (author Note: I think should belong to the Microsoft Bug)".
I once doubted if the length of the bytes array in the decompress function was set long, and then the length was set very small, but after decompression, it returned 0. Really want to go and gates theory.
But luckily, I tested the 4K limit, so Google went under "GZipStream 4K", haha, in a foreign forum (http://www.dotnetmonster.com/Uwe/Forum.aspx/ Dotnet-framework/19787/problem-with-the-gzipstream-class-and-small-streams) Inside finally found the answer: the original GZipStream access to data is accessed with a block of 4 K. So at the time of compression, the stream is returned. ToArray () should first gzipstream.close () (that is, where I suspense above), because GZipStream is in the dispose of the data fully written. Do you mean injustice? I have already write, unexpectedly still want me to close only can.