Recently in a download tool, found that csdn on the download of the resources have not been intercepted, after analysis, finally have a discovery, solved my previous file download when the garbled problem, so reproduced this article, I hope people can see, can get help, also used for memo.
The contents of the header in the CSDN download are as follows:
Key value
Content-disposition attachment; Filename*=utf-8 ' Reflector.rar
In response to the above interpretation of this type of writing, mainly refers to the writing behind filename, so reprint the following content, we look:
************************************************************************
Recently, a case was encountered in the project: the need to implement a mandatory download function in the browser (that is, forcing the browser to pop up the download dialog box), and the file name must remain the same as the user uploaded before (may contain non-ASCII characters).
The previous requirement is easy to implement: Use the HTTP Header's content-disposition:attachment, and you can also match content-type:application/octet-stream to ensure foolproof. The next requirement is the egg ache, which involves the coding problem of the Header (the filename is placed inside the content-disposition as the filename parameter). It is well known that the Content-type in the HTTP header can specify the encoding of the content, but how can the code of the header itself be developed? Even, does the Header allow non-ASCII encoding?
If you let the coding problem no matter, then congratulate you, you will certainly encounter in a system and browser download files when the file name garbled. If you try to solve the search, then once again congratulations, you will find a bunch of contradictory solutions (I can responsibly tell you that 99% of them are non-standard trick). Let's see how we can solve this problem gracefully and perfectly.
In order to explore this problem, I took a lot of detours. From their own attempts, to Google, Baidu (try the Chinese and English search), and then to read Discuz and other classic projects of source code, opinions, consensus. Finally I think of the return to the RFC, from the standard documents to find ways, sure enough to reap. As the inquiry process is too tortuous, I will write down the standard practice first.
The content-disposition should be set up like this:
Content-disposition: Attachment; FileName="$encoded _fname"; filename*=utf-8 "$encoded _fname
Wherein, $encoded _fname refers to the UTF-8 encoded original file name according to RFC 3986 after the percent UrlEncode (using the Rawurlencode () function in PHP). These lines can also be combined into a single line, which is recommended to be separated by a space.
In addition, to be compatible with IE6, ensure that the original file name must include the English extension!
Well, let's see why we're doing this and why we can do it.
First, based on the HTTP 1.1 protocol specification (RFC 2616 section 4), the HTTP message format is actually based on the ancient ARPA INTERNET TEXT MESSAGES (RFC 822 section 3), according to which the message can only be ASCII-encoded. RFC 2616 Section 2.2 again emphasizes that in TEXT to use a different character set, the string must be encoded as ASCII using the rules of RFC 2047 (in fact this rule was originally intended for MIME extensions, using Base64 encoding, format and percent-semicolon encoding Vary greatly). In summary, the text data in the HTTP Header must be ASCII-encoded by standard.
FileName=2616 Standard, text must be ASCII Word upper is considered to be the "original" filename*=charset' lang ' encoded2047 Extended, note the subtle differences in format, Using base64 encoding (encoding result is also ASCII character)
However, when the HTTP 1.1 standard was introduced in 1999, the Content-dispostion Header was not part of the formal standard, but was borrowed directly from the MIME standard because it was widely used (RFC 2616 section 1 9.5.1). So there are few browsers to support Content-disposition's multilingual encoding feature such an "Extended feature extension feature" (in fact, the proposed use of RFC 2047来 for multilingual encoding in the HTTP 1.1 draft has never been supported by mainstream browsers).
But this problem is really necessary, so the browser has come up with a number of ways:
- IE supports a hybrid version of the two formats: Filename= "Encoded_text" (which uses the percent-encoding). Originally in accordance with RFC 2616, the quotation marks should be treated as content directly, even if it "looks like the encoded string", but IE will "automatically" to decode such a file name, if the file name must have a suffix will not be encoded (that is, the normal English letter suffix name)!
- Some other browsers support a more brutal approach-allowing UTF-8 encoded strings to be used directly in Filename= "TEXT"!
The behavior of these two types of browsers is incompatible with each other. So you can judge UA and then use the previous approach to IE, other browsers use the latter one, so that you can generally be able to just work effect (Discuz is doing so). For Opera and Safari, however, this may not necessarily be effective.
ERA in progress, 2010 RFC 5987 Released, formally specifies the HTTP Header for the processing of multi-language encoding, should be similar to the MIME extension of the parameter*=charset ' lang ' value format, but where value should be based on the RFC 3986 Section 2.1 is encoded with a percent sign and specifies that the browser should support at least ASCII and UTF-8. Subsequently, in 2011, RFC 6266 was released, formally incorporating content-disposition into the HTTP standard, and again emphasizing the method of multi-language encoding in RFC 5987, and an example was given to solve the backward compatibility problem-the example I gave at the outset:
Content-disposition: Attachment; FileName="Encoded_text"; filename*=utf-8 "encoded_text
In this example, newer Firefox, Chrome, Opera, Safari, and other browsers support the new standard filename* and will be used first, so although Filename= "Encoded_text" is not supported by them, there is still no The problem is that the use of UTF-8 is only necessary because it is required to be supported by mandatory requirements in the standard. For older versions of IE, they do not recognize the filename* behind them and will automatically ignore and use the old filename. This is the perfect solution to multi-browser multi-language compatibility problem, neither need UA judgment, also meet the standard.
p.s. Why does PHP use the Rawurlencode () function? Because this is the "percent URL encoding" that really conforms to RFC 3986, it is only because of historical reasons that a urlencode () function was used to implement similar coding rules in HTTP POST, so a strange name was used. The difference between the two is that the former will encode the space as%20, while the latter will encode the + number. If you use the latter, IE6 will change to a plus sign when downloading a file name with spaces. In general, you will not use the UrlEncode () function (Discuz Some versions mistakenly use it for File name encoding, resulting in a space change plus sign of the bug).
Reprint: Correct handling of the browser when downloading files HTTP header encoding problem (content-disposition)