Recently, I am working on the ePartner project, which involves file upload. I have also uploaded files before, but they are all small files, which cannot exceed 2 MB. Upload more than MB. I couldn't find any information to study it. For WEB-based file uploads, FTP and HTTP protocols can be used. Although FTP is stable in transmission, security is a serious problem. In addition, the FTP server reads the user database to obtain permissions, this is not convenient for users. Only HTTP is left. There are three methods in HTTP: PUT, WEBDAV, and RFC1867. The first two methods are not suitable for uploading large files, currently, we use form-based file uploads Based on RFC1867 standard HTML.
I. Briefly introduce RFC1867 (Form-based File Upload in HTML) standards:
1. HTML form with file submission Function
The existing HTML Specification defines eight possible values for the TYPE attribute of the INPUT element: CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. in addition, when the form uses the POST method, the form has the "application/x-www-form-urlencoded" ENCTYPE attribute by default.
RFC1867 made two changes to HTML:
1) added a FILE option for the TYPE attribute of the INPUT element.
2) The INPUT tag can have the ACCEPT attribute, which can specify the file type or file format list that can be uploaded.
In addition, this standard defines a new MIME type: multipart/form-data, and the action that should be taken when processing a form with ENCTYPE = "multipart/form-data" and/or a tag containing <INPUT type = "file">.
For example, when the author of an HTML form wants the user to upload one or more files, he can write as follows:
<Form enctype = "multipart/form-data" ACTION = "_ URL _" METHOD = POST>
File to process:
<Input name = "userfile1" TYPE = "file">
<Input type = "submit" VALUE = "Send File">
</FORM>
The change to the html dtd is to add an option for the InputType object. In addition, we recommend that you use a series of file types separated by commas as the ACCEPT attribute of the INPUT tag.
... (Other elements )...
<! ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
RADIO | SUBMIT | RESET |
IMAGE | HIDDEN | FILE) ">
<! Element input-0 EMPTY>
<! ATTLIST INPUT
TYPE % InputType TEXT
Name cdata # IMPLIED -- required for all but submit and reset
Value cdata # IMPLIED
SRC % URI # IMPLIED -- for image inputs --
CHECKED (CHECKED) # IMPLIED
Size cdata # IMPLIED -- like NUMBERS,
But delimited with comma, not space
Maxlength number # IMPLIED
ALIGN (top | middle | bottom) # IMPLIED
Accept cdata # IMPLIED -- list of content types
>
... (Other elements )...
2. file transmission delay
In some cases, it is recommended that the server verify certain elements (such as user names and accounts) in the form data before preparing to accept the data. However, after some consideration, we think that if the server wants to do this, it is best to use a series of forms, the data elements that have been verified are returned to the client as "hidden" fields, or the elements that need to be verified are displayed first by arranging a form. In this way, servers that require complex applications can maintain the state of transaction processing on their own, while those simple applications can be implemented simply.
The HTTP protocol may need to know the total length of the content in the entire transaction. Even if there is no clear requirement, the HTTP client should provide the total length of all uploaded files, so that a busy server can determine whether the file content is too large to be completely processed, in this way, an error code is returned and the connection is closed, instead of waiting until all the data is accepted. Currently, some existing CGI applications need to know the total length of the content for all POST transactions.
If the INPUT tag contains a MAXLENGTH attribute, the client can regard this attribute value as the maximum number of bytes of the transferred file that the server can accept. In this case, the server prompts the client how much space it has on the server before the upload starts. However, it should be noted that this is just a prompt that the actual requirements of the server may change after the form is created and before the file is uploaded.
In any case, if the received file is too large, Any HTTP server may interrupt the transmission during file transmission.
3. Other solutions for binary data transmission
Some people have suggested using a new MIME type "aggregate", such as aggregate/mixed or content-transfer-encoding "package" to describe binary data with uncertain length, instead of being expressed by dividing them into multiple parts. Although we do not oppose this, it requires additional design and standardization work to make everyone accept and understand "aggregate ". On the other hand, the "split into multiple parts" mechanism works well and can be implemented easily on the client sender and server receiver, and can work as efficiently as some other methods to comprehensively process binary data.
4. Example
Assume that the server segment provides the following HTML:
<Form action = "http://server.dom/cgi/handle"
ENCTYPE = "multipart/form-data"
METHOD = POST>
What is your name? <Input type = text name = submitter>
What files are you sending? <Input type = file name = pics>
</FORM>
In the "name" field, enter "Joe Blow". What files are you sending? ', Select
A text file "file1.txt ".
The customer segment may send back the following data:
Content-type: multipart/form-data, boundary = AaB03x
-- AaB03x
Content-disposition: form-data; name = "field1"
Joe Blow
-- AaB03x
Content-disposition: form-data; name = "pics"; filename = "file1.txt"
Content-Type: text/plain
Contents of... file1.txt...
-- AaB03x --
If you select another image file "file2.gif", the client may send the following data:
Content-type: multipart/form-data, boundary = AaB03x
-- AaB03x
Content-disposition: form-data; name = "field1"
Joe Blow
-- AaB03x
Content-disposition: form-data; name = "pics"
Content-type: multipart/mixed, boundary = BbC04y
-- BbC04y
Content-disposition: attachment; filename = "file1.txt"
Content-Type: text/plain
Contents of... file1.txt...
-- BbC04y
Content-disposition: attachment; filename = "file2.gif"
Content-type: image/gif
Content-Transfer-Encoding: binary
... File2.gif content...
-- BbC04y --
-- AaB03x --
Ii. Two methods for processing file uploads using RFC1867:
1. Get the uploaded data at one time and analyze and process it.
After reading N multiple codes, I found that currently no component program and some COM components use the Request. BinaryRead method. Obtain the uploaded data at a time, and then analyze and process it. This is why the upload of large files is slow. If IIS times out, even if hundreds of MB of files are uploaded, the analysis takes a while.
2. Write to the hard disk while receiving files.
I have learned about commercial components outside China. Some popular components include Power-Web, AspUpload, ActiveFile, ABCUpload, aspSmartUpload, and SA-FileUp. Among them, the better is the ASPUPLOAD and SA-FILE, they claim to be able to process 2 GB of files (SA-file ee version does not even have the FILE size limit), and the efficiency is also very good, is the programming language so much less efficient? I checked some information and thought they were all directly operating the file stream. In this way, the file size is not restricted. However, it is not absolutely perfect for foreigners. After ASPUPLOAD processes large files, the memory usage is astonishing. Around 1 GB is common. As for SA-FILE although it is good but difficult to find the crack. Then we found two. NET Upload components, Lion. Web. UpLoadModule and AspnetUpload, which are also operation file streams. However, the upload speed and CPU usage are not as good as the commercial components of foreigners.
A test was conducted to upload 1 GB files in the LAN. The average upload speed of ASPUPLOAD is 4.4 Mb/s, the CPU usage is 10-15, and the memory usage is 700 mb. SA-FILE is almost like this. The AspnetUpload is only 1.5 M/s at the fastest, with an average of 700 K/s and CPU usage of 15-39. test environment: PIII800, 100 M memory, and m lan. I think the slow speed of AspnetUpload may be caused by hard disk writing while receiving files. The low cost of resource occupation is to reduce the transmission speed. But I also have to admire the foreign program. The CPU usage is so low .....
Iii. ASP. NET File Upload Problems
We have encountered one or more problems when uploading large files using ASP. NET. Setting a large value of maxRequestLength does not completely solve the problem, because ASP. NET blocks the entire file until it is loaded into the memory and then processes it. In fact, if The file is large, we often see that Internet Explorer displays "The page cannot be displayed-Cannot find server or DNS Error". It seems that this Error cannot be caught. Why? Because this is a client side error, Application_Error on the server side cannot be handled.
Iv. ASP. NET large file upload Solution
The solution is to use the implicit HttpWorkerRequest and its GetPreloadedEntityBody and ReadEntityBody methods to read data from the pipe created by IIS for ASP. NET in blocks. Chris Hynes provides us with such a solution (using HttpModule) that allows you to upload large files and display the upload progress in real time.
Lion. Web. UpLoadModule and AspnetUpload both use this solution.
Solution Principle:
HttpHandler is used to implement functions similar to ISAPI Extention, process Request information and send Response ).
Solution highlights:
1. httpHandler or HttpModule
A. The request object is intercepted before the asp.net process processes the request.
B. Read and Write Data in Parts
C. Track the upload progress in real time and update the meta information.
2. Use the implicit HttpWorkerRequest to process the file stream using its GetPreloadedEntityBody and ReadEntityBody methods.
IServiceProvider provider = (IServiceProvider) HttpContext. Current;
HttpWorkerRequest wr = (HttpWorkerRequest) provider. GetService (typeof (HttpWorkerRequest ));
Byte [] bs = wr. GetPreloadedEntityBody ();
....
If (! Wr. IsEntireEntityBodyIsPreloaded ())
{
Int n = 1024;
Byte [] bs2 = new byte [n];
While (wr. ReadEntityBody (bs2, n)> 0)
{
.....
}
}
3. Custom Multipart MIME parser
Automatically intercepts MIME delimiters
Write a file into blocks such as temporary files.
Update Appliaction status in real time (ReceivingData, Error, Complete)