This article is about "how to encode the filename field in the content-disposition of an HTTP package?" "Another discussion of this issue. This problem has been raised a long time ago, there is still no satisfactory answer, at least I think so, so today I throw this question again, with my solution.
I wrote a C + +-based CGI application that can parse files that contain special character filenames, such as: Weird #€= {}; Filename.txt.
There doesn't seem to be a common way to get rid of content-dispostion in HTTP so that it works in every browser, with the following browser:
But I'd be happy to use different methods for different browsers to encode.
Let's talk about the approach I'm using:
Internet Explorer (add double quotes and replace # and; symbols):
Content-Disposition: attachment; filename="weird %23 € = { } %3B filename.txt"
Firefox (double quotes are still useful, no other changes required):
Content-Disposition: attachment; filename="weird # € = { } ; filename.txt"
There is also a viable alternative:
Content-Disposition: attachment; filename*=UTF-8‘‘weird%20%23%20%e2%82%ac%20%3D%20%7B%20%7D%20%3B%20filename.txt
Chrome:
The following problems occur when you use double quotes only:
- The = symbol in the file name is lost
- € will be replaced by-symbol
In this way you can get it done:
Content-Disposition: attachment; filename*=UTF-8‘‘weird%20%23%20%e2%82%ac%20%3D%20%7B%20%7D%20%3B%20filename.txt
Opera:
Using double quotation marks or syntax: Filename*=utf-8 ", this will produce the following problems:
- Multiple contiguous spaces in the file name have only one remaining
- Paired {} lost: "Ab{}cd.txt", "Abcd.txt"
- The semicolon in the file name truncates the following character: "ABC; Def.txt "," ABC "
This is due to file length limitations, and the following example works in opera:
Content-Disposition: attachment; filename*=UTF-8‘‘weird%20%23%20%e2%82%ac%20%3D%20%7B%20%7D%20%3B%20filename.txt
Safari:
If you use double quotation marks, the € symbol will be replaced with invisible characters, unfortunately there is no suitable way to solve this small problem.
But there is a way to get a reference from the original question mentioned above:
Content-Disposition: attachment; filename*=UTF-8‘‘weird%20%23%20%80%20%3D%20%7B%20%7D%20%3B%20filename.txt
But this scheme is useless to me, these escaped characters can't be restored correctly so the browser tries to save the file with the name of the CGI app. The reason for this problem is that I'm using an incorrect coding method. I did not encode it in accordance with the principles of RFC 5987. But Safari does not use the same coding method. So we can only say that the code of the € character is not solved temporarily.
By the way, a UTF-8 encoder converter: http://www.rishida.net/tools/conversion/
All the tests mentioned above are using the latest version of the browser:
PS: I tried all the special characters on the keyboard, but I was talking about the characters that would cause problems.
I tried, by the way. Include all possible special characters in the file name, and the test results are not the same as mentioned above:
The Complete test string:
0!§ $%&()=`´{} []²³@€µ^°~+‘ # - _ . , ; ü ä ö ß 9.jpg
After encoding:
0%20%21%20%C2%A7%20%24%20%25%20%26%20%28%20%29%20%3D%20%60%20%C2%B4%20%7B%20%7D%20%20%20%20%5B%20%5D%20%C2%B2%20%C2%B3%20%40%20%E2%82%AC%20%C2%B5%20%5E%20%C2%B0%20~%20%2B%20%27%20%23%20-%20_%20.%20%2C%20%3B%20%C3%BC%20%C3%A4%20%C3%B6%20%C3%9F%209.jpg
Content-Disposition
To write in this way:
Content-Disposition: attachment; filename*=UTF-8‘‘0%20%21%20%C2%A7%20%24%20%25%20%26%20%28%20%29%20%3D%20%60%20%C2%B4%20%7B%20%7D%20%20%20%20%5B%20%5D%20%C2%B2%20%C2%B3%20%40%20%E2%82%AC%20%C2%B5%20%5E%20%C2%B0%20~%20%2B%20%27%20%23%20-%20_%20.%20%2C%20%3B%20%C3%BC%20%C3%A4%20%C3%B6%20%C3%9F%209.jpg
The following test results were obtained:
Firefox can work properly
Chrome works fine
IE Display: $% & () = ' ´{} []²³@€µ^°~ + ' #–_. , ; Üäöß9.jpg lost the first 6 characters
Explanation: This problem occurs because the browser has a limitation on the length of the file name character: Discard some characters from the beginning of the string. I didn't dig into this, but the normal file name could be about 200 characters long, and those filenames contain many escape character sequences that can be even more, but not more than 250 characters in length. So that's really not a problem.
Opera:0! §$% & () = ' ´[]²³@€µ^°~ + ' #–_. , ; Üäöß9.jpg and IE also lost some of the characters.
Description: I have scaled back my test string because I suspect there is a length limitation like ie in Opera.
This encoding does not work correctly in Safari.
The test now illustrates the fact that the syntax such as "Filename*=utf-8" Filname escape sequence can work in a browser other than safari. In Safari, however, only € will be replaced by this method. So the problem is not big.
About file name length:
Some problems with file name length were found in the test.
In Internet Explorer: A file name can be up to 147 characters long. If a transfer character does not appear in the string, this is the total length of the file name. If there is a transfer character in the string, the situation changes somewhat. The final file name is less than 147 characters long. But the rules are a little strange and I can't find an exact rule. If I use two escape characters, the file name is shortened by 5 characters, but when I use a lot of escape characters, the file name ends up with only two characters shortened.
Other browsers do not have this length limitation issue. As long as the system is able to process the file, it will be able to save the file successfully. I tested the next 250 character-length file names, and chrome prompted me to reduce the filename length, and opera would automatically shrink to 220 characters, and Firefox would automatically shrink to 210 characters. Opera will start shrinking from the end of the file name. Safari tried to save that long file name, but the processing failed to save and the file name in the download list was shown as-1.
Character coding techniques for the filename field in content-disposition [go]