Encoding of HTTP headers when downloading files is handled correctly (content-disposition)

Last Update:2017-07-22 Source: Internet

Author: User

Tags http post rfc urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: https://blog.robotshell.org/2012/deal-with-http-header-encoding-for-file-download/

Recently, a case was encountered in the project: a mandatory download is required (that is, forcing the download dialog box to prevent the browser from trying to resolve the display of certain file formats), and the file name must remain the same as the user uploaded earlier (possibly containing non-ASCII characters).

The previous requirement is easy to implement: You can use HTTP headers and work Content-Disposition: attachment together Content-Type: application/octet-stream to ensure that it is foolproof. The next requirement is the egg ache, which involves the coding problem of the Header (the file name is placed as Content-Disposition the filename parameter). As we all know, the HTTP header Content-Type can be specified in the content (body) of the code, the Header itself can be encoded and how to make? Even, does the Header allow non-ASCII encoding?

If you leave the coding problem regardless, then you will certainly encounter in a system and browser download files when the file name garbled, if you try to solve, then you will likely find a bunch of contradictory solutions (I can responsibly tell you that 99% of them are non-standard trick). Let's see how we can solve this problem gracefully and perfectly.

In order to explore this problem, I took a lot of detours. From their own attempts to Google (try to search in English and Chinese), and then to read Discuz and other classic projects of source code, opinions, consensus. Finally I think of the return to the RFC, from the standard documents to find ways, sure enough to reap. Since the process of inquiry is so tortuous, I will write down the standard approach first-it should be set up like this Content-Disposition :

Content-disposition: attachment;                     FileName= "$encoded _fname";                     FileName*=utf-8 '$encoded _fname

$encoded_fnamethis refers to the UTF-8 encoded original file name according to RFC 3986 for the percent encoding (percent encoding) obtained after (using the function in PHP rawurlencode() ). These lines can also be combined into one line (it is recommended to use a space separated).

In addition, to be compatible with IE6, ensure that the original file name must include the English extension !

Bottom

Let's take a look at why we're doing this and why we can do that.

First, according to the HTTP 1.1 protocol defined by RFC 2616 (RFC 2068 is the earliest version; 2616 replaces 2068 and is most widely used, and then replaced by other RFCs, which is mentioned later), the HTTP message format is based on the ancient ARPA Internet The Text Messages, while the ARPA message can only be ASCII encoded (RFC 822 section 3). RFC 2616 Section 2.2 is again emphasized that text (the field value in section 4.2:header is text) in order to use a different character set, the string must be encoded/escaped using the rules of RFC 2047-it is important to note that this rule originally is an extension for MIME (e-mail), and the format is very different from the percent-semicolon encoding. Give an example in MIME:

Subject: =? Iso-8859-1? B? swygew91ignhbibyzwfkihroaxmgew8=?=

When RFC 2616 was introduced in 1999, Content-Dispostion the Header was not yet part of the formal HTTP protocol, but was borrowed directly from the MIME Standard (RFC 2616 section 19.5.1) because it was widely used. Thus there is almost no browser to support Content-Disposition the multi-language encoding feature such a "Extended feature extension feature". In fact, the feature recommended in RFC 2616 for multilingual encoding using RFC 2047来 has never been supported by mainstream browsers, so we don't have to worry about this MIME scheme ...

But this problem is really necessary, so the browser has come up with a number of ways:

IE supports the use of the percent-encoding directly in filename: filename="$encoded_text" (not MIME-encoded!) ）。 Originally, according to RFC 2616, if the part of the quotation mark is not MIME-encoded, it should be treated as content directly, even if it "looks like a percent-encoded string", but IE will "automatically" decode such a file name if the file name must have one that is not encoded (i.e. ASCII) suffix name !
Some other browsers support a more brutal approach: Allow filename="TEXT" UTF-8 encoded strings to be used directly in! This is also a direct violation of the RFC 2616 HTTP header must be an ASCII encoding requirement.

The behavior of these two types of browsers is incompatible with each other. So you can judge UA and then use the previous approach to IE, other browsers use the latter one, so that you can generally be able to just work effect (Discuz is doing so). For Opera and Safari, however, this may not necessarily be effective.

ERA in progress, 2010 RFC 5987 Released, formally specifies the HTTP Header in the format of the processing of multi-language encoding parameter*=charset‘lang‘value , wherein:

CharSet and Lang are case insensitive.
Lang is the language used to label fields for reading software recitation or special rendering based on language features, which can be left blank.
Value uses percent encoding according to RFC 3986 Section 2.1, and specifies that the browser should support at least ASCII and UTF-8.
The browser should use the latter when parameter and parameter* appear in the HTTP header at the same time.

The advantage is that the forward compatibility is maintained: One HTTP header is still ascii-only, and the older browsers that do not support this standard will use the parameter* as a field name in accordance with RFC 2616 of the year, thus ignoring it as an unknown. Subsequently, the 2011 RFC 6266 was released, formally Content-Disposition incorporating the HTTP standard, and again emphasizing the multi-language encoding method in RFC 5987, and an example was given to resolve backward compatibility issues:

Content-disposition: attachment;                     FileName= "EURO rates";                     FileName*=utf-8 '%e2%82%ac%20rates

In this example, the value of filename is a synonym for the English phrase-this is in accordance with RFC 2616, the ordinary field should not be encoded, and the use of UTF-8 is only because it is mandatory in the standard must be supported. However, if we think about it again-the current market is often the old version of the browser more than IE. As a result, we can make the appropriate modifications by using the FileName field directly with the percent-encoded string:

Content-disposition: attachment;                     FileName= "%e2%82%ac%20rates.txt";                     FileName*=utf-8 '%e2%82%ac%20rates.txt

Newer Firefox, Chrome, Opera, Safari, and other browsers support and use the new standard filename*, even if they do not automatically decode filename, and for older versions of IE, they do not recognize Filena me*, it will automatically ignore and use the old filename (the only minor flaw is the need to have an English suffix name). This is the perfect solution to multi-browser multi-language compatibility issues, neither need UA judgment, but also more consistent with the standard.

p.s. Why does PHP use rawurlencode() functions? Because this is the "percent URL encoding" that really conforms to RFC 3986, just because of historical reasons, a urlencode() function was used to implement similar coding rules in HTTP POST, so a strange name was used. The difference between the two is that the former will encode the space as%20, while the latter will encode the + number. If you use the latter, IE6 will change to a plus sign when downloading a file name with spaces. In general, you will not be able to use urlencode() this function (a bug in which some versions of Discuz use it incorrectly for file name encoding, resulting in a space variable plus sign).

Content Security Policy causes Bookmarklet to fail
WordPress Arras Theme Simplified Chinese translation file

Encoding of HTTP headers when downloading files is handled correctly (content-disposition)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More