Track and restore large file downloads in asp.net

Source: Internet
Author: User
Tags contains file size header iis range tostring zip zip extension
asp.net| Recovery | Downloading problems with large file downloads in Web applications has been difficult to name, so for most sites, if the user's downloads are interrupted, they can only say that sadness has befallen the user. But we don't need to do this now because you can make your ASP.net application capable of supporting a large file download that can be resumed (continued). By using the methods provided in this article, you can track the download process so that you can work with dynamically created files--and you don't need legacy ISAPI dynamic-link libraries and unmanaged (unmanaged) C + + code to achieve this goal.

It's easiest to provide clients with a service that downloads files from the Internet, right? Simply copy the downloadable files to your Web application directory, publish the links, and let IIS do all the work. However, file services should not be more painful than the neck (and more trouble), you don't want the entire world to have access to your data, you don't want the server to be filled with hundreds of static files, you even want to download temporary files--only when the client starts downloading it's free time to build the files.

Unfortunately, the default response to download requests using IIS is unlikely to achieve these effects. So in general, in order to gain control over the download process, developers need to link to a custom. aspx page where they check the user credentials (credential), build downloadable files, and push the file to the client using the following code:

Response.WriteFile
Response.End ()
And that's where the real trouble comes in.

What's the problem?

The WriteFile method looks perfect, making binary data for the file flow to the client. But not until recently did we know that the WriteFile approach is a well-known memory-hogging, which loads the entire file into the server's RAM to provide services (it actually takes up twice times the size of the file). For large files, this can cause a service memory problem and may repeat the asp.net process. But in June 2004, Microsoft released a patch to solve the problem. This patch is now part of the. NET Framework 1.1 Patch Pack (SP1).

This patch introduces the TransmitFile method, which reads a disk file into a small memory buffer and begins to transfer the file. Although this solution solves the problem of memory and loops, it is still unsatisfactory. You cannot control the life cycle of the response. You can't know if the download is done correctly, you have no way to know if the download was interrupted, and (if you create a temporary file) you don't know if and when you can delete the files. Worse, if the download does fail, the TransmitFile method downloads from the client's next attempt at the head of the file.

One possible solution-implementing the Background Intelligent Transfer Service (BITS) is not feasible for most sites because it destroys efforts to maintain the independence of the client browser and operating system.

A satisfactory solution is based on the first attempt by Microsoft to solve the memory clutter problems caused by WriteFile (see KB article 812406). The article demonstrated the intelligent bulk data download process, which reads data from the file stream. Before the server sends a byte block to the client, it uses the Response.IsClientConnected property to check whether the client is still holding the connection. If you remain connected, it continues to send stream bytes, or it stops to prevent the server from sending unnecessary data.
This is the approach we use, especially when downloading temporary files. When IsClientConnected returns false, you know that the download process is interrupted, you should save the file, and conversely, when the process completes successfully, you delete the temporary file. In addition, to restore interrupted downloads, you will need to start downloading from a file point that failed the client connection during the last download attempt.

HTTP protocol and header information (header) support

HTTP protocol support can be used to process header information for interrupted downloads. With a small amount of HTTP header information, you can enhance your download process so that it is fully compliant with the HTTP protocol specification. This specification, together with ranges, provides all the information needed to recover an interrupted download.

Here's how it works. First, if the server supports the continuation of a client breakpoint, it sends the Accept-ranges header information in the initial response. The server also sends an entity label (entity tag) header information (ETAG), which contains a unique identification string.

The following code shows some header information that IIS sends to the client to respond to an initial download request, and it passes the details of the requested file to the client.

http/1.1 OK
Connection:close
Date:tue, Oct 15:11:23 GMT
Accept-ranges:bytes
Last-modified:sun, Sep 15:52:45 GMT
ETag: "47febb2cfd76c41:2062"
Cache-control:private
Content-type:application/x-zip-compressed
content-length:2844011
After receiving the header information, if the download is interrupted, ie browser sends the ETag value and range header information back to the server in subsequent download requests. The following code shows some header information that IE sent to the server when attempting to recover an interrupted download.

Get Http://192.168.100.100/download.zip http/1.0
range:bytes=822603-
Unless-modified-since:sun, Sep 15:52:45 GMT
If-range: "47febb2cfd76c41:2062"
These header messages indicate that IE caches the entity tags provided by IIS and sends it back to the server in the If-range header, which is one way to ensure that downloads are recovered from exactly the same file. Unfortunately, not all browsers work the same way. Other HTTP header information sent by the client to authenticate the file may be if-match, if-unmodified-since, or unless-modified-since. Clearly, the specification has no explicit provision for which header information the client software must support, or which header information must be used. As a result, some clients do not use header information at all, and IE only use If-range and unless-modified-since. You'd better check this information in code. In this way, your application can follow the HTTP specification at a very high level and can use a variety of browsers. The range header information indicates the requested byte range--in the example, it is the starting point where the server should recover the file stream.

When IIS receives the requested type for the recovery download, it sends back a response message that contains the following header information:

http/1.1 206 Partial Content
Content-range:bytes 822603-2844010/2844011
Accept-ranges:bytes
Last-modified:sun, Sep 15:52:45 GMT
ETag: "47febb2cfd76c41:2062"
Cache-control:private
Content-type:application/x-zip-compressed
content-length:2021408
Note that the above code is somewhat different from the HTTP response of the original download request-the request to restore the download was 206 and the original download was 200. This indicates that the content passed in through the line is part of the file. This time the Content-range header information indicates the exact number and position of the bytes being passed.

IE is very picky about the headers. If the initial response does not contain ETag header information, IE will never attempt to recover the download. Other clients I tested do not use ETag header information, they are simply dependent on file names, request scopes, and use Last-modified header information (if they attempt to validate the file).

In-depth understanding of HTTP protocols

The header information shown in the previous section is sufficient to run the solution that restores the download, but it does not completely overwrite the HTTP specification.

In a single request, the range header information can ask multiple scopes, which is called the "multipart ranges". Do not confuse with segmented downloads (segmented downloading), where almost all download tools use segmented downloads to increase download speed. These tools claim to increase the download speed by opening two or more concurrent connections (different ranges of each connection request file).

The idea of a multi-part range does not open multiple connections, but it allows the client software to request the first 10 and the last 10 bytes of a file in a single request/response cycle.

To be honest, I never found a piece of software using this feature. But I refused to write "it is not completely HTTP compliant" in the code declaration. Omitting this feature is bound to violate the Murphy law (Murphy ' laws). In any case, the multiple-part scope is used in e-mail transmissions, with the head information, plain text, and attachments separate.



Sample code

We know how the client and the server Exchange header information to ensure a recoverable download, combine this knowledge with the idea of a file block stream, and you can add a reliable download management capability to your ASP.net application.

The way to gain control over the download process is to intercept the download request from the client, read the header information, and respond appropriately. Before. NET, you had to write an ISAPI (Internet Server API) application to implement this functionality, but. NET Framework component provides a IHttpHandler interface that allows you to use only when implemented in a class. NET code to intercept and process requests. This means that your application has full control and responsiveness to the download process and will never involve or use IIS's automation functions.

The sample code contains a custom HttpHandler class (Ziphandler) in the Httphandler.vb file. The Ziphandler implements the IHttpHandler interface and processes requests for all. zip files.

To test the sample code, you need to create a new virtual directory in IIS and copy the source files there. Create a file called Download.zip in this directory (note that IIS and asp.net cannot handle downloads larger than 2GB, so make sure your files do not exceed the limit). Configure your IIS virtual directory to map the. zip extension by Aspnet_isapi.dll.

HttpHandler class: Ziphandler

After the. zip extension is mapped in ASP.net, IIS invokes the ProcessRequest method of the Ziphandler class (see download code) each time the client requests a. zip file to the server.

The ProcessRequest method first establishes an instance of the custom fileinformation class (see download code) that encapsulates the status of the download (for example, in progress, interrupted, and so on). The example hard-coded the path of the Download.zip sample file into the code. If you apply this code to your own application, you need to modify it to open the requested file.

' Use Objrequest to detect which file is requested, open objfile with that file.
' For example objfile = New download.fileinformation (< full file name >)
objfile = New Download.fileinformation (_
ObjContext.Server.MapPath ("~/download.zip"))
Next, the program performs a series of validation checks using the described HTTP header information (if the request provides a header). It encapsulates each check in a small private function and returns true if the validation succeeds. If a validation check fails, the response terminates immediately and the appropriate statuscode value is sent.

If not ObjRequest.HttpMethod.Equals (http_method_get) Or not
ObjRequest.HttpMethod.Equals (Http_method_head) Then
' Currently only supports get and head methods
Objresponse.statuscode = 501 ' No execution
ElseIf not objfile.exists Then
' Unable to find the requested file
Objresponse.statuscode = 404 ' not Found
ElseIf objfile.length > Int32.MaxValue Then
' The file is too big
Objresponse.statuscode = 413 ' request entity too large
ElseIf not Parserequestheaderrange (objrequest, Alrequestedrangesbegin, Alrequestedrangesend, _
Objfile.length, Bisrangerequest) Then
' Range request contains a useless entity
Objresponse.statuscode = 400 ' unwanted request
ElseIf not Checkifmodifiedsince (objrequest,objfile) Then
' Entity has not been modified
Objresponse.statuscode = 304 ' has not been modified
ElseIf not Checkifunmodifiedsince (objrequest,objfile) Then
' Entity has been modified since the last date requested
Objresponse.statuscode = 412 ' preprocessing failed
ElseIf not Checkifmatch (objrequest, objfile) Then
' Entity does not match request
Objresponse.statuscode = 412 ' preprocessing failed
ElseIf not Checkifnonematch (objrequest, Objresponse,objfile) Then
' The entity does match the None-match request.
' Response code is in the Checkifnonematch function
Else
' Preliminary check succeeded
The Parserequestheaderrange (see download code) in these preliminary functions checks whether the client has requested a file range (which means a partial download). This method sets Bisrangerequest to true if the requested range is invalid (an invalid range means a range value that exceeds the file size or contains an unreasonable number). If a range is requested, the Checkifrange method verifies the Ifrange header information.

If the requested range is valid, the code calculates the size of the response information. If the client requests multiple scopes, the value of the response information size contains the number of portions of the header information length.

If you cannot determine the value of a sent header message, the program processes the download request as an initial request rather than a partial download, sending a new download stream from the top of the file.

If bisrangerequest AndAlso checkifrange (objrequest, objfile) Then
' This is the scope request
' If the range array contains more than one entity, it is also a multi-part range request
Bmultipart = CBool (alrequestedrangesbegin.getupperbound (0) >0)
' Enter each range to get the entire response length
For iloop = alrequestedrangesbegin.getlowerbound (0) to Alrequestedrangesbegin.getupperbound (0)
' The length of the content (in this range)
Iresponsecontentlength + + Convert.ToInt32 (Alrequestedrangesend (_
Iloop)-Alrequestedrangesbegin (Iloop)) + 1
If Bmultipart Then
' If it is a multi-part range request, calculate the length of the intermediate header information that will be sent
Iresponsecontentlength + = Multipart_boundary. Length
Iresponsecontentlength + + ObjFile.ContentType.Length
Iresponsecontentlength + = Alrequestedrangesbegin (Iloop). Tostring.length
Iresponsecontentlength + = Alrequestedrangesend (Iloop). Tostring.length
Iresponsecontentlength + + ObjFile.Length.ToString.Length
' 49 is the length of the newline and other necessary characters in the multi-part download
Iresponsecontentlength + 49
End If
Next Iloop

If Bmultipart Then
' If it is a multi-part range request,
' We also have to figure out the length of the last intermediate header message that will be sent
Iresponsecontentlength +=multipart_boundary. Length
' 8 is the length of the dash and line feed
Iresponsecontentlength + 8
Else
' Not a multi-part download, so we have to explain the response scope of the initial HTTP header information
Objresponse.appendheader (Http_header_content_range, "bytes" & _
Alrequestedrangesbegin (0). ToString & "-" & _
Alrequestedrangesend (0). ToString & "/" & _
objFile.Length.ToString)
' End If
' Scope response
Objresponse.statuscode = 206 ' Local response
Else
' This is not a range request, or the requested scope entity ID does not match the current entity ID.
' So start a new download
' Indicates the size of the file completion section equals the length of the content
Iresponsecontentlength =convert.toint32 (objfile.length)
' Return to normal OK state
Objresponse.statuscode = 200
End If
' The server must then send several important response header information, such as content length, Etag, and the content type of the file:
' Writes the content length to the response
Objresponse.appendheader (http_header_content_length,iresponsecontentlength.tostring)
' Write the last modified date to the response
Objresponse.appendheader (http_header_last_modified,objfile.lastwritetimeutc.tostring ("R"))
' Tell the client software we accepted the scope request
Objresponse.appendheader (Http_header_accept_ranges,http_header_accept_ranges_bytes)
' Write the entity tag of the file in response (enclosed in quotes)
Objresponse.appendheader (Http_header_entity_tag, "" "" & Objfile.entitytag & "" ")
' Write the content type to the response
If Bmultipart Then
' Multi-part message has this special type
' The actual MIME type of the file in the example will be written in response later
Objresponse.contenttype = Multipart_contenttype
Else
' The file content type owned by a single partial message
Objresponse.contenttype = Objfile.contenttype
End If

Everything you need to download is ready to start downloading the file. You will use the FileStream object to read the byte block from the file. Set the state property of the Fileinformation instance objfile to fsdownloadinprogress. As long as the client remains connected, the server reads the byte block from the file and sends it to the client. For a multiple-part download, this code sends a specific header message. If the client disconnects, the server sets the file status to Fsdownloadbroken. If the server completes the requested range send process, it will set the status to fsdownloadfinished (see download code).


Fileinformation Auxiliary class

In the Ziphandler section you will find that Fileinformation is a helper class that encapsulates download status information (such as downloads, interrupts, and so on).

To create an instance of fileinformation, you need to pass the path of the requested file to the constructor of the class:

Public Sub New (ByVal spath as String)
M_objfile = New System.IO.FileInfo (spath)
End Sub
Fileinformation uses the System.IO.FileInfo object to get information about the file, which is exposed as a property of the object (such as whether the file exists, the file name, size, and so on). This class also exposes a DownloadState enumeration that describes the various states of a download request:

<flags () > Enum downloadstate
' Clear: No download process, file may be in maintenance
Fsclear = 1
' Locked: Dynamically created files cannot be changed
fslocked = 2
' In Progress: The file is locked and the download process is in progress
Fsdownloadinprogress = 6
' Broken: The file is locked, the download process is in progress, but it is canceled
Fsdownloadbroken = 10
' Finished: The file is locked and the download process is complete.
fsdownloadfinished = 18
End Enum
Fileinformation also provides the Entitytag property value. This value in the sample code is hard-coded, this is because the sample code uses only one download file, and the file will not be changed, but for the actual application, you will provide multiple files, or even dynamically build the file, your code must provide a unique Entitytag value for each file. Also, this value must be changed each time that the file is changed or modified. This enables the client software to verify that the byte blocks they have already downloaded are still up to date. The following is the part of the sample code that returns the Hard-coded Entitytag value:

Public ReadOnly Property Entitytag () as String
' Entitytag for the initial (200) response to the client, and recovery requests from the client
Get
' Creates a unique string for the file.
' Note that the unique code must be retained as long as the file has not changed.
' But if the file does change or is modified, the code has to change.
Return "Myexamplefileid"
End Get
End Property
A simple and roughly enough secure Entitytag may consist of the file name and the date the file was last modified. Whatever method you use, you must ensure that the value is truly unique and not confused with the entitytag of other files. I want to dynamically name the established files in my application, in accordance with customer, customer, and postcode indexes, and store the GUIDs used as Entitytag in the database.

The Zipfilehandler class reads and sets the public State property. After the download is complete, it sets the state to fsdownloadfinished. This time you can delete the temporary files. There is a general need to invoke the Save method to maintain the state.

Public Property State () as DownloadState
Get
Return m_nstate
End Get
Set (ByVal nstate as DownloadState)
M_nstate = nstate
' Optional action: You can delete files automatically at this time.
' If the state is set to Finished, you will never need this file again.
' If nstate =downloadstate.fsdownloadfinished Then
' Clear ()
' Else
' Save ()
' End If
Save ()
End Set
End Property
Any time the file state changes, Zipfilehandler should call the Save method to save the state of the file so that it can be displayed to the user in the future. You can also use it to preserve the entitytag you have built. Do not keep the state and Entitytag values of the files in application, session, or cache-you must save the information across the lifecycle of these objects.

Private Sub Save ()
' Save the status of the file download to a database or XML file.
' Of course, if you don't create files dynamically, you don't need to save this state.
End Sub
As mentioned earlier, the sample code handles only an existing file (Download.zip), but you can further enhance the program to create the requested file as needed.

When testing the sample code, your local system or LAN may be too fast to interrupt the download process, so I recommend that you use a slow LAN connection (reducing the site's bandwidth in IIS is a simulated method) or putting the server on the Internet.

Downloading files on the client is still difficult. Incorrect or misconfigured web buffering servers for the ISP may fail the large file download process, including a deteriorating download condition or the end of an early conversation. If the file size exceeds 255MB, you should encourage customers to use Third-party download management software, although some of the latest browsers have built a basic download manager.
If you want to extend the sample code further, it is useful to review the HTTP specification. You can create MD5 checksums for downloads, add them with CONTENT-MD5 header information, and provide a way to verify the integrity of the download file. The sample code does not involve other HTTP methods other than get and head.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.