Convert php character encoding to gb2312 to utf8. In php, we usually use iconv and mb_convert_encoding for character encoding conversion. However, mb_convert_encoding has much worse conversion performance than iconv. Stringiconv (we usually use iconv and mb_convert_encoding for character encoding conversion in php. However, mb_convert_encoding is much inferior to iconv in conversion performance.
String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;
Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.
The code is as follows: |
|
/** * Automatically converts gbk or gb2312 encoded strings into utf8 * The encoding class of the input string can be automatically determined. if the encoding class is UTF-8, no conversion is required. Otherwise, the conversion is a UTF-8 string. * Supported character encoding types: UTF-8, gbk, and gb2312 * @ $ Str: string */ Function yang_gbk2utf8 ($ str ){ $ Charset = mb_detect_encoding () ($ str, array ('utf-8', 'gbk', 'gb2312 ')); $ Charset = strtolower ($ charset ); If ('cp936' ==$ charset ){ $ Charset = 'gbk '; } If ("UTF-8 "! = $ Charset ){ $ Str = iconv ($ charset, "UTF-8 // IGNORE", $ str ); } Return $ str; } |
Next I will look at some problems with converting character encoding.
Use the mb_detect_encoding ($ str); function. to use this function, you must open the extension = php_mbstring.dll extension of php.
The code is as follows: |
|
$ Str = "test ing "; $ Cha = mb_detect_encoding ($ str ); Echo $ cha; ?>
|
I entered on the gb2312 page, but the output result is a strange UTF-8 and I haven't found the reason.
I want to convert to UTF-8 encoding in a unified way, using the following method
The code is as follows: |
|
$ Str = "test ing "; $ Cha = mb_detect_encoding ($ str ); $ S = iconv ($ cha, "UTF-8", $ str ); Var_dump ($ s ); ?>
|
Result returned:
String (0) ""
That's strange. why.
Use
The code is as follows: |
|
$ Str = "test ing "; $ Cha = mb_detect_encoding ($ str ); $ S = iconv ("GB2312", "UTF-8", $ str ); Var_dump ($ s ); ?>
|
The returned result is correct. The mb_detect_encoding ($ str) function is found to be inaccurate. I don't know why.
Function string mb_convert_encoding (string $ str, string $ to_encoding [, mixed $ from_encoding])
Can be converted to a specified encoded string. I wrote an example.
The code is as follows: |
|
$ A = "I'm fine "; Echo mb_convert_encoding ($ a, 'utf-8 '); ?>
|
The result is:
?? Why? Why?
The problem now is that if I convert different string encoding formats to UTF-8 in a unified manner, can I use iconv if I know the change beforehand, but what should I do if I do not know the encoding?
Question 3: iconv: If the encoding of the first byte of the converted string is greater than a certain number, null is returned.
For example:
The code is as follows: |
|
$ Str = chr (254). "test ing". chr (254 ); $ S = iconv ("GB2312", "UTF-8", $ str ); Var_dump ($ s ); ?>
|
Return
String (0) ""
For the usage of mb_convert_encoding, refer to the official website:
Http://cn.php.net/manual/en/function.mb-convert-encoding.php
Another function iconv in PHP is also used to convert string encoding, similar to the function above.
The following are examples:
Iconv-Convert string to requested character encoding
(PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-Convert character encoding
(PHP 4> = 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;
String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.
From_encoding is specified by character code name before conversion. it can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$ Str = mb_convert_encoding ($ str, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
/* "Auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS "*/
$ Str = mb_convert_encoding ($ str, "EUC-JP", "auto ");
Example:
The code is as follows: |
|
$ Content = iconv ("GBK", "UTF-8", $ content ); $ Content = mb_convert_encoding ($ content, "UTF-8", "GBK "); ?> |
Example
This can be converted based on the character encoding of the input and output.
The code is as follows: |
|
Function phpcharset ($ data, $ ){ If (is_array ($ data )){ Foreach ($ data as $ key => $ val ){ $ Data [$ key] = phpcharset ($ val, $ ); } } Else { $ Encode_array = array ('ascii ', 'utf-8', 'gbk', 'gb2312', 'big5 '); $ Encoded = mb_detect_encoding ($ data, $ encode_array ); $ To = strtoupper ($ ); If ($ encoded! = $ ){ $ Data = mb_convert_encoding ($ data, $ to, $ encoded ); } } Return $ data; } ?> |
Bytes. String iconv (string...