Best practices for PHP and UTF-8

Source: Internet
Author: User
This article is the second part of knowledge about PHP, strings, encodings, UTF-8. First, the conclusion-- use UTF-8 encoding in all aspects of PHP .

The PHP language level does not support the Unicode character set, but most of the problems can be handled by UTF-8 encoding.

The best practice is to explicitly know the input code (do not know the detection), the internal unified conversion to UTF-8 encoding, output encoding is also unified UTF-8 encoding.

How the PHP layer handles UTF-8

When manipulating the Unicode character set, be sure to install the mbstring extension and use the corresponding function instead of the native string function. For example, a file encoded as UTF-8 PHP code, if the use of the strlen () function is wrong, use the Mb_strlen () function instead.

Mbstring extensions Most of the functions need to be processed based on an encoding (internal code), be sure to use UTF-8 encoding uniformly, most of which can be in PHP. INI configuration.

Starting with PHP 5.6, the Default_charset configuration can replace Mbstring.http_input,mbstring.http_output.

Another important configuration is Mbstring.language, which is the default value of Neutral (UTF-8).

Note that the internal encoding of the file encoding and mbstring extension is not the same concept.

Generally speaking:

    • Php. INI is involved in the mbstring extension as far as possible using UTF-8.

    • Use the mbstring extension function instead of the native string manipulation function.

    • When using the relevant functions, it is important to understand the encoding of the characters you manipulate, when using the corresponding function, the display of the write UTF-8 encoding parameters, such as the htmlentities () function of the third parameter display write UTF-8.

How file IO operations handle UTF-8

Here is an example, if you want to open a file, but do not know what the content of the file is encoded, then how to deal with it?

The best practice is to convert to UTF-8 when you open it, and then back to the original encoding and save it to the file after you modify the content. Look at the code:

if (mb_internal_encoding () = "UTF-8") {        mb_internal_encoding ("UTF-8");} $file = "file.txt";//a Chinese file encoded as GBK $str= File_get_contents ($file);//No matter what the source is encoded, the uniform display is converted to UTF-8 if (mb_check_encoding ($str, "GBK"))    $str =  Mb_convert _encoding ($str, "UTF-8", "GBK"); $STR = "Modified content"; $str =  mb_convert_encoding ($str, $SRCBM, "UTF-8");//Turn Back file_put_contents ($file, $STR);

Best practices for Mysql and UTF-8

This is relatively simple, first of all to ensure that your Mysql is UTF-8. Then the MySQL client connection also remains UTF-8, specific to PHP, is imysql or PDO extension connection Mysql is set UTF-8 as the connection code, the two sides are consistent, generally do not encounter problems.

Best practices for browsers and UTF-8

This is also relatively simple, that is, your output if it is a Web page, then your string processing output is always kept as UTF-8, while PHP. INI also explicitly set Default_charset as utf-8;html Meta Tag is also clearly identified as UTF-8.

Is everything all right now, and no, although the server and browser let the user use UTF-8 encoding, but the user's behavior is not binding, he may have entered other encoded characters, or upload the file name is other encoded characters, then how to do? The user's encoding can be detected through the Mb_http_input () and mb_check_encoding () functions, and then internally converted to UTF-8. Ensure that at any level, the final processing is UTF-8 encoding. In other words, you need the means to know what encoding your input is, and the encoding of the control output is UTF-8 after processing is complete.

It is not recommended to use the Mbstring.encoding_translation directive and the mb_detect_encoding () function. Tortured me for half a day.

Best practices for operating systems and UTF-8

Because of the operating system, PHP handles Unicode filenames with different processing mechanisms.

In Linux, the file name is always UTF-8 encoded, and in the Chinese Windows environment, the file name is always GBK encoded, remember this is possible.

The following examples illustrate:

command-line program function, run in Chinese version of Windows 10 operating system, file encoded as UTF-8 function Filenameexample () {    $filename = "test. txt";    $GBK _filename = Iconv ("UTF-8", "GBK", $filename);    File_put_contents ($GBK _filename, "test");    Echo file_get_contents ($GBK _filename);} function Scandirexample () {    $arr = Scandir ("./tmp");    foreach ($arr as $v) {        if ($v = = "." | | $v = = "...")            Continue;        $filename = Iconv ("GBK", "UTF-8", $v);        $content = file_get_contents ("./tmp/". $v);}    }

If you don't want to write a program that is compatible with Windows and Linux, you can encode the file name UrlEncode, for example:

function Urlencodeexample () {   $filename = "Test 2.txt";   $urlencodefilename = UrlEncode ($filename);   File_put_contents ($urlencodefilename, "test");   Echo file_get_contents ($urlencodefilename);}

When using PHP to download files through the header () function, also consider the browser and operating system (most people use Windows), for Chrome, the output file name encoding can be Utf-8,chrome will automatically convert the file name to GBK encoding.

For the lower version of IE, it inherits the operating system environment, so download file name if it is Chinese must transcode to UTF-8 code, otherwise download when the user sees is garbled file name. To illustrate by code:

$agent =$_server["Http_user_agent"];if (Strpos ($agent, ' MSIE ')!==false  {    $filename = iconv ("UTF-8", "GBK", " Annex. txt ");    Header ("content-disposition:attachment; Filename=\ "$filename \" ");}
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.