SummaryIf your application has never been compressed, you are lucky. For another developer who uses compression, the good news is that. NET 2.0 now provides two classes to handle compression and decompression issues. This article is intended to discuss when and how to use these useful tools.
Introduction
The new namespace in. NET Framework 2.0 is System. IO. Compression. The new namespace provides two data compression classes: DeflateStream and GZipStream. Both compression classes support lossless compression and decompression, and are designed to handle the compression and decompression problems of streaming data.
Compression is an effective way to reduce the data size. For example, if you store a large amount of data in your SQL database, you can save a lot of disk space if you compress the data before saving it to a table. In addition, since you save more small pieces of data to your database, the operations on disk I/O will be greatly reduced. The disadvantage of compression is that it requires another processing time on your machine, and before you decide to apply compression to your program, you need to calculate this part of time.
Compression is extremely useful when you need to transfer data online, especially for very slow and expensive networks, such as GPRS connections. In this case, compression can greatly reduce the data size and reduce the overall communication cost. Web services are another field-at this time, compression can provide a huge advantage Because XML data can be highly compressed.
But once you think that the performance cost of the program is worth using compression, you will need to thoroughly understand the two new compression classes of. NET 2.0, which is exactly what I want to discuss in this article.
Create a sample application
In this article, I will build a sample application to demonstrate compression usage. This application allows you to compress files, including common text files. Then, you can reuse the code in this example in your own application.
First, use Visual Studio 2005 to create a new Windows application and use the following controls to fill in the default form (see Figure 1 ):
Figure 1. Fill form: Fill the default Form1 with all displayed controls.
· GroupBox Control
· RadioButton Control
· TextBox Control
· Button control
· Label Control
Switch to the code-behind of Form1 and import the following namespace:
Imports System. IO
Imports System. IO. Compression
Before you start using the compression class, it is very important to understand how it works. These compression classes read data from a byte array, compress it, and store the results to a stream object. For decompression, extract the data stored in a stream object and store it in another stream object.
First, define the Compress () function. It has two parameters: algo and data. The first parameter specifies the algorithm used (GZip or Deflate); the second parameter is a byte array containing the data to be compressed. A memory stream object is used to store compressed data. Once compression is completed, you need to calculate the compression ratio, which is calculated by dividing the size of the compressed data by the size of the extracted data.
Then, the compressed data stored in the memory stream is copied to another byte array and returned to the calling function. In addition, you need to use a StopWatch object to track how long the compression algorithm takes. The Compress () function is defined as follows:
Public Function Compress (ByVal algo As String, ByVal data () As Byte ()
Try
Dim sw As New Stopwatch
'--- Ms is used to store compressed data ---
Dim MS As New MemoryStream ()
Dim zipStream As Stream = Nothing
'--- Start stopwatch timing ---
Sw. Start ()
If algo = "Gzip" Then
ZipStream = New GZipStream (MS, CompressionMode. Compress, True)
ElseIf algo = "Deflate" Then
ZipStream = New DeflateStream (MS, CompressionMode. Compress, True)
End If
'--- Use the information stored in the data for compression ---
ZipStream. Write (data, 0, data. Length)
ZipStream. Close ()
'--- Stop stopwatch ---
Sw. Stop ()
'--- Calculate the compression ratio ---
Dim ratio As Single = Math. Round (ms. Length/data. Length) * 100, 2)
Dim msg As String = "Original size:" & data. Length &_
", Compressed size:" & ms. Length &_
", Compression ratio:" & ratio & "% "&_
", Time spent:" & sw. ElapsedMilliseconds & "ms"
LblMessage. Text = msg
Ms. Position = 0
'--- Used to store compressed data (byte array )---
Dim c_data (ms. Length-1) As Byte
'--- Read the content of the memory stream to the byte array ---
Ms. Read (c_data, 0, ms. Length)
Return c_data
Catch ex As Exception
MsgBox (ex. ToString)
Return Nothing
End Try
End Function
The Decompress () function will Decompress the data compressed by the Compress () function. The first parameter specifies the algorithm to be used. The byte array containing compressed data is passed as the second parameter, and then copied to a memory stream object. These compression classes then decompress the data stored in the memory stream and store the extracted data to another stream object. To obtain the extracted data, you need to read the data from the stream object. This is implemented by using the RetrieveBytesFromStream () function (which will be explained later ).
The Decompress () function is defined as follows:
Public Function Decompress (ByVal algo As String, ByVal data () As Byte ()
Try
Dim sw As New Stopwatch
'--- Copy data (Compressed) to ms ---
Dim MS As New MemoryStream (data)
Dim zipStream As Stream = Nothing
'--- Start stopwatch ---
Sw. Start ()
'--- Extract data stored in ms ---
If algo = "Gzip" Then
ZipStream = New GZipStream (MS, CompressionMode. Decompress)
ElseIf algo = "Deflate" Then
ZipStream = New DeflateStream (MS, CompressionMode. Decompress, True)
End If
'--- Used to store extracted data ---
Dim dc_data () As Byte
'--- The extracted data is stored in zipStream;
'Extract them to a byte array ---
Dc_data = RetrieveBytesFromStream (zipStream, data. Length)
'--- Stop stopwatch ---
Sw. Stop ()
LblMessage. Text = "Decompression completed. Time spent :"&_
Sw. ElapsedMilliseconds & "ms "&_
", Original size:" & dc_data.Length
Return dc_data
Catch ex As Exception
MsgBox (ex. ToString)
Return Nothing
End Try
End Function
This RetrieveBytesFromStream () function uses two parameters: a stream object, an integer, and a byte array containing extracted data. This integer parameter is used to determine how many bytes are read from the stream object to the byte array each time. This is necessary, because when the data is extracted, you do not know the size of the extracted data in the stream object. Therefore, it is necessary to dynamically expand the byte array into blocks to store the extracted data during runtime. When you constantly expand the byte array, the block is too much memory, and the block is too small, it will lose precious time. Therefore, the optimal block size to be read can be determined by calling routines.
The RetrieveBytesFromStream () function is defined as follows:
Public Function RetrieveBytesFromStream (_
ByVal stream As Stream, ByVal bytesblock As Integer) As Byte ()
'--- Retrieving bytes from a stream object ---
Dim data () As Byte
Dim totalCount As Integer = 0
Try
While True
'--- Gradually increase the size of the Data byte array --
ReDim Preserve data (totalCount + bytesblock)
Dim bytesRead As Integer = stream. Read (data, totalCount, bytesblock)
If bytesRead = 0 Then
Exit While
End If
TotalCount + = bytesRead
End While
'--- Ensure that the byte array correctly contains the number of extracted bytes ---
ReDim Preserve data (totalCount-1)
Return data
Catch ex As Exception
MsgBox (ex. ToString)
Return Nothing
End Try
End Function
Note: In the Decompress () function, you call the RetrieveBytesFromStream () function, as shown below:
Dc_data = RetrieveBytesFromStream (zipStream, data. Length)
The block size refers to the size of the compressed data (data. length ). In most cases, the extracted data is several times larger than the compressed data (shown by the compression ratio). Therefore, you can dynamically expand the byte array several times at most during runtime. For example, if the compression ratio is 20% and the size of the compressed data is 2 MB, the decompressed data will be 10 MB in this case. Therefore, the byte array will be dynamically expanded by 5 times. Ideally, the byte array should not be extended too frequently during runtime, because it will seriously slow down the application running speed. However, using the size of the compressed data as the block size is a good method.