Data | compression
SummaryIf your application has never used compression, then you are lucky. The good news for another part of the developer who uses compression is that. NET 2.0 now offers two classes to handle compression and decompression issues. This article is about when and how to use these useful tools.
Introduction
. A new namespace in NET Framework 2.0 is System.IO.Compression. This new namespace provides two data compression classes: Deflatestream and GZipStream. Both of these compression classes support lossless compression and decompression, and are designed to deal with the compression and decompression problems of streaming data.
Compression is an effective way to reduce data size. For example, if you have a huge amount of data stored in your SQL database, you can save a lot of disk space if you compress the data before saving it to a table. And, now that you're saving smaller chunks of data to your database, the amount of disk I/O is going to be much less expensive. The disadvantage of compression is that it requires additional processing of your machine (and therefore additional processing time), and you need to calculate this part of the time before you decide to apply the compression to your program.
Compression is extremely useful in situations where you need to transfer data online, especially for very slow and costly networks, such as GPRS connections. In this case, using compression can greatly reduce the size of the data and decrease the overall communication cost. Web services are another area-at this point, using compression can provide a huge advantage because XML data can be highly compressed.
But once you think that the performance cost of a program is worth using compression, you will need to have a deep understanding of the two new compression classes of. NET 2.0, which is what I want to illustrate in this article.
creating the sample application
In this article, I'll build a sample application to show the use of compression. The application allows you to compress files, including plain text files. You can then apply the code in the example to your own application.
First, create a new Windows application using Visual Studio 2005 and use the following controls to populate the default form (see Figure 1):
Figure 1. Populating the form: populates the default Form1 with all displayed controls.
· GroupBox control
· RadioButton control
· TextBox control
· Button control
· Label control
Switch to Form1 's code-behind and import the following namespaces:
Imports System.IO
Imports System.IO.Compression
Before you start using the compression class, it's important to understand how it works. These compressed classes read data from a byte array, compress it, and store the results in a stream object. For decompression, extract the data stored in a Stream object and store it in another stream object.
First, define the compress () function, which has two parameters: Algo and data. The first parameter specifies which algorithm to use (gzip or deflate), and the second parameter is a byte array that contains the data to compress. A memory stream object is used to store compressed data. Once the compression is complete, you need to calculate the compression ratio, which is calculated by dividing the size of the compressed data by the size of the extracted data.
The compressed data stored in the memory stream is then copied into another byte array and returned to the calling function. In addition, you will use a Stopwatch object to track how much time the compression algorithm uses. The Compress () function is defined as follows:
Public Function Compress (ByVal algo as String, ByVal data () as Byte) as Byte ()
Try
Dim SW as New stopwatch
'---MS is used to store compressed data---
Dim MS as New MemoryStream ()
Dim ZIPstream as Stream = Nothing
'---Start the stopwatch timer---
Sw. Start ()
If algo = "Gzip" Then
ZIPstream = New GZipStream (MS, compressionmode.compress, True)
ElseIf algo = "Deflate" Then
ZIPstream = New Deflatestream (MS, compressionmode.compress, True)
End If
'---Use the information stored in the data to compress---
Zipstream.write (data, 0, data.) Length)
Zipstream.close ()
'---Stop the stopwatch---
Sw. Stop ()
'---compute the compression ratio---
Dim ratio as Single = Math.Round (ms. Length/data. LENGTH) * 100, 2)
Dim msg as String = "Original size:" & Data. Length & _
", Compressed Size:" & Ms. Length & _
", Compression ratio:" & ratio & "%" & _
", Time Spent:" & SW. Elapsedmilliseconds & "MS"
Lblmessage.text = Msg
Ms. Position = 0
'---used to store compressed data (byte array)---
Dim C_data (Ms. LENGTH-1) as Byte
'---Read the contents of the memory stream to a byte array---
Ms. Read (c_data, 0, Ms. Length)
Return C_data
Catch ex as Exception
MsgBox (ex. ToString)
Return Nothing
End Try
End Function
This decompress () function extracts the data that is compressed by the compress () function. The first parameter specifies the algorithm to use. The byte array that contains the compressed data is passed as the second parameter, and then it is copied into a memory stream object. These compressed classes then extract the data stored in the memory stream and then store the extracted data in another stream object. To get the extracted data, you need to read the data from the Stream object. This is done by using the Retrievebytesfromstream () function (which will be explained later).
The decompress () function is defined as follows:
Public Function Decompress (ByVal algo as String, ByVal data () as Byte) as Byte ()
Try
Dim SW as New stopwatch
'---copy data (compressed) to MS---
Dim MS as New MemoryStream (data)
Dim ZIPstream as Stream = Nothing
'---start the stopwatch---
Sw. Start ()
'---Use data stored in MS to decompress---
If algo = "Gzip" Then
ZIPstream = New GZipStream (MS, compressionmode.decompress)
ElseIf algo = "Deflate" Then
ZIPstream = New Deflatestream (MS, compressionmode.decompress, True)
End If
'---to store the extracted data---
Dim Dc_data () as Byte
'---the extracted data is stored in the ZIPstream;
' Extract them into a byte array---
Dc_data = Retrievebytesfromstream (ZIPstream, data. Length)
'---Stop the stopwatch---
Sw. Stop ()
Lblmessage.text = "Decompression completed. Time Spent: "& _
Sw. Elapsedmilliseconds & "MS" & _
", Original Size:" & Dc_data. Length
Return Dc_data
Catch ex as Exception
MsgBox (ex. ToString)
Return Nothing
End Try
End Function
This retrievebytesfromstream () function uses two parameters: a Stream object, an integer, and returns a byte array containing the extracted data. This integer argument is used to determine how many bytes are read from the stream object to the byte array at a time. This is necessary because when the data is uncompressed, you do not know the size of the uncompressed data that exists in the stream object. Therefore, it is necessary to dynamically expand the byte array into blocks to store in the uncompressed data during the run time. When you continue to expand the byte array, the block is too general waste of memory, and the block is too small will lose precious time. Therefore, you can determine the optimal block size to read by calling the routine.
The Retrievebytesfromstream () function is defined as follows:
Public Function Retrievebytesfromstream (_
ByVal Stream as stream, ByVal Bytesblock as Integer) as Byte ()
'---Retrieve bytes from a Stream object---
Dim data () as Byte
Dim totalcount as Integer = 0
Try
While True
'---incrementally increase the size of the data byte array--
ReDim Preserve Data (TotalCount + bytesblock)
Dim bytesread as Integer = stream. Read (data, TotalCount, Bytesblock)
If bytesread = 0 Then
Exit while
End If
TotalCount + + Bytesread
End While
'---Ensure that the byte array contains the number of bytes extracted correctly---
ReDim Preserve data (totalCount-1)
Return data
Catch ex as Exception
MsgBox (ex. ToString)
Return Nothing
End Try
End Function
Notice that in the decompress () function, you call the Retrievebytesfromstream () function, as follows:
Dc_data = Retrievebytesfromstream (ZIPstream, data. Length)
The block size refers to the size of the compressed data (data.length). In most cases, the uncompressed data is several times larger than the compressed data (as shown by the compression ratio), so you will dynamically extend the byte array several times during the run time. As an example, assuming that the compression ratio is 20 and the size of the compressed data is 2MB, then, in this case, the extracted data will be 10MB. Therefore, the byte array is dynamically expanded by 5 times times. Ideally, the byte array should not be extended too frequently during runtime, as this can severely slow down the application. However, it is a good idea to use the size of the compressed data as a block size.
Handling Compressed Events
Now that you've defined the main compression and decompression routines, you can then encode the various buttons. The event handlers corresponding to the Compress button are as follows:
Private Sub Btncompress_click (ByVal sender as System.Object, _
ByVal e as System.EventArgs) Handles Btncompress.click
'---used to store compressed data---
Dim Compresseddata () as Byte
'---compress data---
If rbgzipstream.checked Then
Compresseddata = Compress ("Gzip", System.Text.Encoding.ASCII.GetBytes (Txtbefore.text))
Else
Compresseddata = Compress ("Deflate", System.Text.Encoding.ASCII.GetBytes (Txtbefore.text))
End If
'---Copy the compressed data into a string---
Dim I as Integer
Dim s as New System.Text.StringBuilder ()
For i = 0 to Compresseddata.length-1
If I <> compresseddata.length-1 Then
S.append (Compresseddata (i) & "")
Else
S.append (Compresseddata (i))
End If
Next
'---display compressed data as a string---
Txtafter.text = s.tostring
End Sub
The data in the Txtbefore control is converted to a byte array and then compressed. The compressed data is then converted to a string for easy display in the Txtafter.
The event handlers corresponding to the decompress button are as follows:
Private Sub Btndecompress_click (ByVal sender as System.Object, _
ByVal e as System.EventArgs) Handles Btndecompress.click
'---format a compressed string into an array of bytes---
Dim Eachbyte () as String = TxtAfter.Text.Split ("")
Dim data (eachbyte. LENGTH-1) as Byte
For I as the Integer = 0 to Eachbyte. Length-1
Data (i) = Convert.tobyte (Eachbyte (i))
Next
'---Extract the data and display the extracted data---
If rbgzipstream.checked Then
Txtbefore.text = System.Text.Encoding.ASCII.GetString (Decompress ("Gzip", data)
Else
Txtbefore.text = System.Text.Encoding.ASCII.GetString (Decompress ("Deflate", data)
End If
End Sub
It converts the data displayed in the control txtafter into a byte array and sends it for decompression. The uncompressed data is displayed back into the Txtbefore control.
The event handler code corresponding to the "Select file to Compress" button is as follows:
Private Sub Btnselectfile_click (ByVal sender as System.Object, _
ByVal e as System.EventArgs) Handles Btnselectfile.click
'---Let the user select a file to compress--
Dim OpenFileDialog1 as New OpenFileDialog ()
' openfiledialog1.initialdirectory = ' c:\ '
Openfiledialog1.filter = "All Files (*.*) |*.*"
Openfiledialog1.restoredirectory = True
If openfiledialog1.showdialog () = Windows.Forms.DialogResult.OK Then
'---Read the contents of the file into a byte array---
Dim filecontents as Byte ()
FileContents = My.Computer.FileSystem.ReadAllBytes (openfiledialog1.filename)
'---Create the gzip file---
Dim filename as String = openfiledialog1.filename & ". Gzip"
If file.exists (filename) Then file.delete (filename)
Dim fs as FileStream = New FileStream (filename, filemode.createnew, FileAccess.Write)
'---compress the contents of the file---
Dim Compressed_data as Byte ()
If rbgzipstream.checked Then
Compressed_data = Compress ("Gzip", filecontents)
Else
Compressed_data = Compress ("Deflate", filecontents)
End If
If compressed_data IsNot Nothing Then
'---Write the compressed content into a compressed file---
Fs. Write (compressed_data, 0, Compressed_data.length)
Fs. Close ()
End If
End If
End Sub
It reads the contents of the file selected by the user, compresses it, and creates a new file containing the compressed data (with the same file name, but with a. gzip extension).
The event handler code corresponding to the "Select file to Decompress" button is as follows:
Private Sub Btndecompressfile_click (ByVal sender as System.Object, _
ByVal e as System.EventArgs) Handles Btndecompressfile.click
'---Let the user select a file to extract---
Dim OpenFileDialog1 as New OpenFileDialog ()
' openfiledialog1.initialdirectory = ' c:\ '
Openfiledialog1.filter = "All GZIP files (*.gzip) |*.gzip"
Openfiledialog1.restoredirectory = True
If openfiledialog1.showdialog () = Windows.Forms.DialogResult.OK Then
'---Read the contents of the compressed file into a byte array---
Dim filecontents as Byte ()
FileContents = My.Computer.FileSystem.ReadAllBytes (openfiledialog1.filename)
'---Extract the contents of the file---
Dim Uncompressed_data as Byte ()
If rbgzipstream.checked Then
Uncompressed_data = Decompress ("Gzip", filecontents)
Else
Uncompressed_data = Decompress ("Deflat", filecontents)
End If
'---Create the extracted file---
Dim filename as String = openFileDialog1.FileName.Substring (0, openfiledialog1.filename.length-5)
If file.exists (filename) Then file.delete (filename)
Dim fs as FileStream = New FileStream (filename,filemode.createnew, FileAccess.Write)
If uncompressed_data IsNot Nothing Then
'---Write the extracted content to the file---
Fs. Write (uncompressed_data, 0, Uncompressed_data.length)
Fs. Close ()
End If
End If
End Sub
It reads the contents of the file selected by the user, extracts it, and creates a new file containing the extracted data (by removing its. gzip extension).
Testing the Application
Press F5 to test the application (see Figure 2).
Figure 2. Test application: Select the compression algorithm to use, and then you can compress a text string or a file content.
You should pay attention to the following facts:
· Compressing the number of text will actually result in a larger compressed text.
· Different text will produce different compression ratios, although the number of characters is fixed.
· Text files have the best compression effect; they can bring the best compression ratio.
· Other binary files, such as. Exe,jpg, do not usually have a good compression effect and may result in a compression ratio greater than 100 of the percent, which is worthless.
It should be noted that. NET, the implementation of the gzip and deflate algorithms is less efficient than other third-party gzip tools on the market. Although you can use it. NET class compresses 10MB of data to 4MB, but you find that using a third-party tool can achieve a smaller compression size. In addition, this compression class cannot manipulate data larger than 4GB. However, implementations in. NET will allow you to decompress all files that are compressed using the other gzip tools in the market.
Summary
In this article, you've seen how to use a compressed class in. NET 2.0. Although this implementation is not as effective as those in the market for non-MS programs, it does provide you with an easy (free) way to come in your. NET application to add the compression function.