Using. NET 2.0 compression/decompression to handle large data _ practical tips

Source: Internet
Author: User

If your application has never used compression, then you are lucky. The good news for another part of the developer who uses compression is that. NET 2.0 now offers two classes to handle compression and decompression issues. This article is about when and how to use these useful tools.

Introduction

. A new namespace in NET Framework 2.0 is System.IO.Compression. This new namespace provides two data compression classes: Deflatestream and GZipStream. Both of these compression classes support lossless compression and decompression, and are designed to deal with the compression and decompression problems of streaming data.

Compression is an effective way to reduce data size. For example, if you have a huge amount of data stored in your SQL database, you can save a lot of disk space if you compress the data before saving it to a table. And, now that you're saving smaller chunks of data to your database, the amount of disk I/O is going to be much less expensive. The disadvantage of compression is that it requires additional processing of your machine (and therefore additional processing time), and you need to calculate this part of the time before you decide to apply the compression to your program.

Compression is extremely useful in situations where you need to transfer data online, especially for very slow and costly networks, such as GPRS connections. In this case, using compression can greatly reduce the size of the data and decrease the overall communication cost. Web services are another area-at this point, using compression can provide a huge advantage because XML data can be highly compressed.

But once you think that the performance cost of a program is worth using compression, you will need to have a deep understanding of the two new compression classes of. NET 2.0, which is what I want to illustrate in this article.

Creating the sample Application

In this article, I'll build a sample application to show the use of compression. The application allows you to compress files, including plain text files. You can then apply the code in the example to your own application.

First, create a new Windows application using Visual Studio 2005 and use the following controls to populate the default form (see Figure 1):

Figure 1. Populating the form: populates the default Form1 with all displayed controls.

· GroupBox control

· RadioButton control

· TextBox control

· Button control

· Label control

Switch to Form1 's code-behind and import the following namespaces:

Imports System.IO

Imports System.IO.Compression

Before you start using the compression class, it's important to understand how it works. These compressed classes read data from a byte array, compress it, and store the results in a stream object. For decompression, extract the data stored in a Stream object and store it in another stream object.

First, define the compress () function, which has two parameters: Algo and data. The first parameter specifies which algorithm to use (gzip or deflate), and the second parameter is a byte array that contains the data to compress. A memory stream object is used to store compressed data. Once the compression is complete, you need to calculate the compression ratio, which is calculated by dividing the size of the compressed data by the size of the extracted data.

The compressed data stored in the memory stream is then copied into another byte array and returned to the calling function. In addition, you will use a Stopwatch object to track how much time the compression algorithm uses. The Compress () function is defined as follows:

Public Function Compress (ByVal algo as String, ByVal data () as Byte) as Byte ()

Try

Dim SW as New stopwatch

'---MS is used to store compressed data---

Dim MS as New MemoryStream ()

Dim ZIPstream as Stream = Nothing

'---Start the stopwatch timer---

Sw. Start ()

If algo = "Gzip" Then

ZIPstream = New GZipStream (MS, compressionmode.compress, True)

ElseIf algo = "Deflate" Then

ZIPstream = New Deflatestream (MS, compressionmode.compress, True)

End If

'---Use the information stored in the data to compress---

Zipstream.write (data, 0, data.) Length)

Zipstream.close ()

'---Stop the stopwatch---

Sw. Stop ()

'---compute the compression ratio---

Dim ratio as Single = Math.Round (ms. Length/data. LENGTH) * 100, 2)

Dim msg as String = "Original size:" & Data. Length & _

", Compressed Size:" & Ms. Length & _

", Compression ratio:" & ratio & "%" & _

", Time Spent:" & SW. Elapsedmilliseconds & "MS"

Lblmessage.text = Msg

Ms. Position = 0

'---used to store compressed data (byte array)---

Dim C_data (Ms. LENGTH-1) as Byte

'---Read the contents of the memory stream to a byte array---

Ms. Read (c_data, 0, Ms. Length)

Return C_data

Catch ex as Exception

MsgBox (ex. ToString)

Return Nothing

End Try

End Function

This decompress () function extracts the data that is compressed by the compress () function. The first parameter specifies the algorithm to use. The byte array that contains the compressed data is passed as the second parameter, and then it is copied into a memory stream object. These compressed classes then extract the data stored in the memory stream and then store the extracted data in another stream object. To get the extracted data, you need to read the data from the Stream object. This is done by using the Retrievebytesfromstream () function (which will be explained later).

The decompress () function is defined as follows:

Public Function Decompress (ByVal algo as String, ByVal data () as Byte) as Byte ()

Try

Dim SW as New stopwatch

'---copy data (compressed) to MS---

Dim MS as New MemoryStream (data)

Dim ZIPstream as Stream = Nothing

'---start the stopwatch---

Sw. Start ()

'---Use data stored in MS to decompress---

If algo = "Gzip" Then

ZIPstream = New GZipStream (MS, compressionmode.decompress)

ElseIf algo = "Deflate" Then

ZIPstream = New Deflatestream (MS, compressionmode.decompress, True)

End If

'---to store the extracted data---

Dim Dc_data () as Byte

'---the extracted data is stored in the ZIPstream;

' Extract them into a byte array---

Dc_data = Retrievebytesfromstream (ZIPstream, data. Length)

'---Stop the stopwatch---

Sw. Stop ()

Lblmessage.text = "Decompression completed. Time Spent: "& _

Sw. Elapsedmilliseconds & "MS" & _

", Original Size:" & Dc_data. Length

Return Dc_data

Catch ex as Exception

MsgBox (ex. ToString)

Return Nothing

End Try

End Function

This retrievebytesfromstream () function uses two parameters: a Stream object, an integer, and returns a byte array containing the extracted data. This integer argument is used to determine how many bytes are read from the stream object to the byte array at a time. This is necessary because when the data is uncompressed, you do not know the size of the uncompressed data that exists in the stream object. Therefore, it is necessary to dynamically expand the byte array into blocks to store in the uncompressed data during the run time. When you continue to expand the byte array, the block is too general waste of memory, and the block is too small will lose precious time. Therefore, you can determine the optimal block size to read by calling the routine.

The Retrievebytesfromstream () function is defined as follows:

Public Function Retrievebytesfromstream (_

ByVal Stream as stream, ByVal Bytesblock as Integer) as Byte ()

'---Retrieve bytes from a Stream object---

Dim data () as Byte

Dim totalcount as Integer = 0

Try

While True

'---incrementally increase the size of the data byte array--

ReDim Preserve Data (TotalCount + bytesblock)

Dim bytesread as Integer = stream. Read (data, TotalCount, Bytesblock)

If bytesread = 0 Then

Exit while

End If

TotalCount + + Bytesread

End While

'---Ensure that the byte array contains the number of bytes extracted correctly---

ReDim Preserve data (totalCount-1)

Return data

Catch ex as Exception

MsgBox (ex. ToString)

Return Nothing

End Try

End Function

Notice that in the decompress () function, you call the Retrievebytesfromstream () function, as follows:

Dc_data = Retrievebytesfromstream (ZIPstream, data. Length)

The

block size refers to the size of the compressed data (data.length). In most cases, the uncompressed data is several times larger than the compressed data (as shown by the compression ratio), so you will dynamically extend the byte array several times during the run time. As an example, assuming that the compression ratio is 20 and the size of the compressed data is 2MB, then, in this case, the extracted data will be 10MB. Therefore, the byte array is dynamically expanded by 5 times times. Ideally, the byte array should not be extended too frequently during runtime, as this can severely slow down the application. However, it is a good idea to use the size of the compressed data as a block size.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.