PHP output pages are often garbled. what should I do? Today, I provide a method to automatically determine the php character encoding and convert the string encoded by gbk or gb2312 to utf8. In php, we usually use iconv and mb_convert_encoding for character encoding conversion. However, mb_convert_encoding has much worse conversion performance than iconv.
String iconv (string in_charset, string out_charset, string str) Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE. // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters, and // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;
Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.
The code is as follows:
/**
* Automatically converts gbk or gb2312 encoded strings into utf8
* The encoding class of the input string can be automatically determined. if the encoding class is UTF-8, no conversion is required. Otherwise, the conversion is a UTF-8 string.
* Supported character encoding types: UTF-8, gbk, and gb2312
* @ $ Str: string
*/
Function yang_gbk2utf8 ($ str ){
$ Charset = mb_detect_encoding () ($ str, array ('utf-8', 'gbk', 'gb2312 '));
$ Charset = strtolower ($ charset );
If ('cp936' ==$ charset ){
$ Charset = 'gbk ';
}
If ("UTF-8 "! = $ Charset ){
$ Str = iconv ($ charset, "UTF-8 // IGNORE", $ str );
}
Return $ str;
}
Next I will look at some problems with converting character encoding.
Use the mb_detect_encoding ($ str); function. to use this function, you must open the extension = php_mbstring.dll extension of php.
The code is as follows:
$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
$ S = iconv ($ cha, "UTF-8", $ str );
Var_dump ($ s );
?>
Result returned:
String (0) ""
That's strange. why.
The code is as follows:
$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
$ S = iconv ("GB2312", "UTF-8", $ str );
Var_dump ($ s );
?>
The returned result is correct. The mb_detect_encoding ($ str) function is found to be inaccurate. I don't know why.
Function string mb_convert_encoding (string $ str, string $ to_encoding [, mixed $ from_encoding])
Can be converted to a specified encoded string. I wrote an example.
The code is as follows:
$ A = "I'm fine ";
Echo mb_convert_encoding ($ a, 'utf-8 ');
?>
The result is:
?? Why? Why?
The problem now is that if I convert different string encoding formats to UTF-8 in a unified manner, can I use iconv if I know the change beforehand, but what should I do if I do not know the encoding?
Question 3: iconv: If the encoding of the first byte of the converted string is greater than a certain number, null is returned.
For example:
The code is as follows:
$ Str = chr (254). "test ing". chr (254 );
$ S = iconv ("GB2312", "UTF-8", $ str );
Var_dump ($ s );
?>
Return
String (0) ""For the usage of mb_convert_encoding, refer to the official website:
Http://cn.php.net/manual/en/function.mb-convert-encoding.php
Another function iconv in PHP is also used to convert string encoding, similar to the function above.
The following are examples:
Iconv-Convert string to requested character encoding
(PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-Convert character encoding
(PHP 4> = 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;
String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.
From_encoding is specified by character code name before conversion. it can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$ Str = mb_convert_encoding ($ str, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
/* "Auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS "*/
$ Str = mb_convert_encoding ($ str, "EUC-JP", "auto ");
Example:
The code is as follows:
$ Content = iconv ("GBK", "UTF-8", $ content );
$ Content = mb_convert_encoding ($ content, "UTF-8", "GBK ");
?>
This can be converted based on the character encoding of the input and output.
The code is as follows:
Function phpcharset ($ data, $ ){
If (is_array ($ data )){
Foreach ($ data as $ key => $ val ){
$ Data [$ key] = phpcharset ($ val, $ );
}
} Else {
$ Encode_array = array ('ascii ', 'utf-8', 'gbk', 'gb2312', 'big5 ');
$ Encoded = mb_detect_encoding ($ data, $ encode_array );
$ To = strtoupper ($ );
If ($ encoded! = $ ){
$ Data = mb_convert_encoding ($ data, $ to, $ encoded );
}
}
Return $ data;
}
?>