PHP and UTF-8 best practices in detail, UTF-8 Best Practices
The article "PHP string, encoding, UTF-8" describes the basic knowledge of some columns, relatively boring, now point useful-PHP string processing best practices, this article is "PHP, String, encoding, UTF-8" related knowledge of the second part. Let's begin with the conclusion that UTF-8 encoding is used in every aspect of PHP.
At the PHP language level, Unicode character sets are not supported, but most problems can be addressed through UTF-8 encoding.
The best practice is to clearly know the input encoding (do not know the detection), the internal unified conversion to UTF-8 encoding, the output encoding is also unified UTF-8 encoding.
How to deal with UTF-8 at the PHP level
When operating the Unicode Character Set, be sure to install the mbstring extension and use the corresponding function to replace the native string function. For example, a file encoded as a UTF-8 PHP code, if using the strlen () function is wrong, use the mb_strlen () function instead.
Most of the functions of the mbstring extension need to be processed based on an encoding (internal encoding), so be sure to use UTF-8 encoding in a unified way, most of which can be configured in PHP. INI.
From PHP 5.6, the default_charset configuration can replace mbstring. http_input and mbstring. http_output.
Another important configuration is mbstring. language, which defaults to Neutral (UTF-8 ).
Note that file encoding and mbstring extended internal encoding are not the same concept.
In summary:
- The parts in PHP. INI that involve the mbstring extension should use UTF-8 whenever possible.
- Use the mbstring Extension function instead of the native string operation function.
- When using the relevant functions, please be sure to understand what character encoding you operate, when using the corresponding function, show write UTF-8 encoding parameters, such as htmlentities () the third parameter of the function shows written UTF-8.
How file IO Operations handle UTF-8
For example, if you want to open a file but do not know the encoding of the file content, what should you do?
The best practice is to convert it into a UTF-8 when it is opened, modify the content and then convert it back to the original encoding and save it to the file. View the code:
If (mb_internal_encoding ()! = "UTF-8") {mb_internal_encoding ("UTF-8") ;}$ file = "file.txt"; // a Chinese file encoded as gbk $ str = file_get_contents ($ file ); // convert to UTF-8 if (mb_check_encoding ($ str, "GBK") $ str = mb_convert_encoding ($ str, "UTF-8 ", "GBK"); $ str = "modify content"; $ str = mb_convert_encoding ($ str, $ srcbm, "UTF-8"); // convert it back to file_put_contents ($ file, $ str );
Mysql and UTF-8 Best Practices
This is relatively simple, first ensure that your Mysql is UTF-8. Then Mysql client connection also keep UTF-8, specific to PHP, is imysql or PDO extension connection Mysql are set UTF-8 as the connection encoding, the two sides remain consistent, generally, no problem is encountered.
Best practices for browsers and UTF-8
This is also relatively simple, that is, your output content if it is a Web page, then your string processing output is the most total please keep as UTF-8; at the same time PHP. INI also explicitly sets default_charset as UTF-8; HTML Meta Tag is also clearly identified as UTF-8.
Now everything is done, and no, although the server and browser let the user use UTF-8 encoding, but the user's behavior is not binding, he may enter other encoding characters, or the uploaded file name is another encoded character. What should I do? You can use the mb_http_input () and mb_check_encoding () functions to detect the user's encoding, and then internally convert it to a UTF-8. Make sure that at any level, the final processing is UTF-8 encoding. In other words, you need to be able to know what encoding your input is, And the encoding that controls the output after processing is complete is a UTF-8.
We do not recommend using the mbstring. encoding_translation command and the mb_detect_encoding () function. Torture me for half a day.
Best practices for operating systems and UTF-8
Due to the operating system, PHP has different processing mechanisms when processing Unicode file names.
In Linux, the file name is always UTF-8 encoding, and in Chinese Windows environment, the file name is always GBK encoding, remember this point can be.
The following is an example:
// Command line program function, run in the Chinese version of Windows 10 Operating System, file encoding for UTF-8function filenameexample () {$ filename = "test .txt"; $ gbk_filename = iconv ("UTF-8 ", "GBK", $ filename); file_put_contents ($ gbk_filename, "test"); echo file_get_contents ($ gbk_filename);} function scandirexample () {$ arr = scandir (". /tmp "); foreach ($ arr as $ v) {if ($ v = ". "| $ v = ".. ") continue; $ filename = iconv (" GBK "," UTF-8 ", $ v); $ content = file_get_contents (". /tmp /". $ v );}}
If you do not want to write a program compatible with Windows and linux, you can perform urlencode encoding on the file name, for example:
Function urlencodeexample () {$ filename = "test 2.txt"; $ urlencodefilename = urlencode ($ filename); encode ($ urlencodefilename, "test"); echo file_get_contents ($ urlencodefilename );}
When using PHP to download files through the header () function, you must also consider the browser and Operating System (most people use Windows). For Chrome, the output file name encoding can be a UTF-8, and Chrome will automatically convert the file name to GBK encoding.
For earlier versions of IE, It inherits the operating system environment, so the download file name if it is Chinese must be transcoded as UTF-8 encoding, otherwise the user sees a garbled file name during the download. Code:
$ Agent = $ _ SERVER ["HTTP_USER_AGENT"]; if (strpos ($ agent, 'msi ')! = False {$ filename = iconv ("UTF-8", "GBK", "attachment .txt"); header ("Content-Disposition: attachment; filename = \ "$ filename \"");}
Thank you for reading this article. I hope it will help you. Thank you for your support for this site!