Use. NET 2.0 compression/decompression to process large data

Source: Internet
Author: User

SummaryIf your applicationProgramYou are lucky to have never used compression. For another developer who uses compression, the good news is that. NET 2.0 now provides two classes to handle compression and decompression issues. This article is intended to discuss when and how to use these useful tools.

Introduction

The new namespace in. NET Framework 2.0 is system. Io. compression. The new namespace provides two data compression classes: deflatestream and gzipstream. Both compression classes support lossless compression and decompression, and are designed to handle the compression and decompression problems of streaming data.

Compression is an effective way to reduce the data size. For example, if you store a large amount of data in your SQL database, you can save a lot of disk space if you compress the data before saving it to a table. In addition, since you save more small pieces of data to your database, the operations on disk I/O will be greatly reduced. The disadvantage of compression is that it requires another processing time on your machine, and before you decide to apply compression to your program, you need to calculate this part of time.

Compression is extremely useful when you need to transfer data online, especially for very slow and expensive networks, such as GPRS connections. In this case, compression can greatly reduce the data size and reduce the overall communication cost. Web services are another field-at this time, compression can provide a huge advantage Because XML data can be highly compressed.

But once you think that the performance cost of the program is worth using compression, you will need to thoroughly understand the two new compression classes of. NET 2.0, which is exactly what I want to discuss in this article.

Create a sample application

In this article, I will build a sample application to demonstrate compression usage. This application allows you to compress files, including common text files. Then, you canCodeRe-use in your own applications.

First, use Visual Studio 2005 to create a new windows application and use the following controls to fill in the default form (see Figure 1 ):

Figure 1. Fill form: Fill the default form1 with all displayed controls.

· Groupbox Control

· Radiobutton Control

· Textbox Control

· Button control

· Label Control

Switch to the code-behind of form1 and import the following namespace:

Imports system. Io

Imports system. Io. Compression

Before you start using the compression class, it is very important to understand how it works. These compression classes read data from a byte array, compress it, and store the results to a stream object. For decompression, extract the data stored in a stream object and store it in another stream object.

First, define the compress () function. It has two parameters: algo and data. The first parameter specifies which one to useAlgorithm(Gzip or deflate); the second parameter is a byte array containing the data to be compressed. A memory stream object is used to store compressed data. Once compression is completed, you need to calculate the compression ratio, which is calculated by dividing the size of the compressed data by the size of the extracted data.

Then, the compressed data stored in the memory stream is copied to another byte array and returned to the calling function. In addition, you need to use a stopwatch object to track how long the compression algorithm takes. The compress () function is defined as follows:

Public Function compress (byval algo as string, byval data () as byte ()

Try

Dim SW as new stopwatch

'--- MS is used to store compressed data ---

Dim MS as new memorystream ()

Dim zipstream as stream = nothing

'--- Start stopwatch timing ---

Sw. Start ()

If algo = "gzip" then

Zipstream = new gzipstream (MS, compressionmode. Compress, true)

Elseif algo = "deflate" then

Zipstream = new deflatestream (MS, compressionmode. Compress, true)

End if

'--- Use the information stored in the data for compression ---

Zipstream. Write (data, 0, Data. length)

Zipstream. Close ()

'--- Stop stopwatch ---

Sw. Stop ()

'--- Calculate the compression ratio ---

Dim ratio as single = math. Round (Ms. Length/data. Length) * 100, 2)

Dim MSG as string = "original size:" & Data. Length &_

", Compressed size:" & Ms. Length &_

", Compression ratio:" & ratio & "% "&_

", Time spent:" & SW. elapsedmilliseconds & "Ms"

Lblmessage. Text = msg

Ms. Position = 0

'--- Used to store compressed data (byte array )---

Dim c_data (Ms. Length-1) as byte

'--- Read the content of the memory stream to the byte array ---

Ms. Read (c_data, 0, ms. length)

Return c_data

Catch ex as exception

Msgbox (ex. tostring)

Return nothing

End try

End Function

The decompress () function will decompress the data compressed by the compress () function. The first parameter specifies the algorithm to be used. The byte array containing compressed data is passed as the second parameter, and then copied to a memory stream object. These compression classes then decompress the data stored in the memory stream and store the extracted data to another stream object. To obtain the extracted data, you need to read the data from the stream object. This is implemented by using the retrievebytesfromstream () function (which will be explained later ).

The decompress () function is defined as follows:

Public Function decompress (byval algo as string, byval data () as byte ()

Try

Dim SW as new stopwatch

'--- Copy data (Compressed) to Ms ---

Dim MS as new memorystream (data)

Dim zipstream as stream = nothing

'--- Start stopwatch ---

Sw. Start ()

'--- Extract data stored in MS ---

If algo = "gzip" then

Zipstream = new gzipstream (MS, compressionmode. Decompress)

Elseif algo = "deflate" then

Zipstream = new deflatestream (MS, compressionmode. decompress, true)

End if

'--- Used to store extracted data ---

Dim dc_data () as byte

'--- The extracted data is stored in zipstream;

'Extract them to a byte array ---

Dc_data = retrievebytesfromstream (zipstream, Data. length)

'--- Stop stopwatch ---

Sw. Stop ()

Lblmessage. Text = "decompression completed. Time spent :"&_

Sw. elapsedmilliseconds & "Ms "&_

", Original size:" & dc_data.length

Return dc_data

Catch ex as exception

Msgbox (ex. tostring)

Return nothing

End try

End Function

This retrievebytesfromstream () function uses two parameters: a stream object, an integer, and a byte array containing extracted data. This integer parameter is used to determine how many bytes are read from the stream object to the byte array each time. This is necessary, because when the data is extracted, you do not know the size of the extracted data in the stream object. Therefore, it is necessary to dynamically expand the byte array into blocks to store the extracted data during runtime. When you constantly expand the byte array, the block is too much memory, and the block is too small, it will lose precious time. Therefore, the optimal block size to be read can be determined by calling routines.

The retrievebytesfromstream () function is defined as follows:

Public Function retrievebytesfromstream (_

Byval stream as stream, byval bytesblock as integer) as byte ()

'--- Retrieving bytes from a stream object ---

Dim data () as byte

Dim totalcount as integer = 0

Try

While true

'--- Gradually increase the size of the Data byte array --

Redim preserve data (totalcount + bytesblock)

Dim bytesread as integer = stream. Read (data, totalcount, bytesblock)

If bytesread = 0 then

Exit while

End if

Totalcount + = bytesread

End while

'--- Ensure that the byte array correctly contains the number of extracted bytes ---

Redim preserve data (totalcount-1)

Return data

Catch ex as exception

Msgbox (ex. tostring)

Return nothing

End try

End Function

Note: In the decompress () function, you call the retrievebytesfromstream () function, as shown below:

Dc_data = retrievebytesfromstream (zipstream, Data. length)

The block size refers to the size of the compressed data (data. length ). In most cases, the extracted data is several times larger than the compressed data (shown by the compression ratio). Therefore, you can dynamically expand the byte array several times at most during runtime. For example, if the compression ratio is 20% and the size of the compressed data is 2 MB, the decompressed data will be 10 MB in this case. Therefore, the byte array will be dynamically expanded by 5 times. Ideally, the byte array should not be extended too frequently during runtime, because it will seriously slow down the application running speed. However, using the size of the compressed data as the block size is a good method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.