In PHP character encoding conversion We generally use the iconv and mb_convert_encoding to operate, but mb_convert_encoding in the conversion performance is much worse than iconv oh.
String Iconv (String in_charset, String out_charset, String str) Note: The second parameter, in addition to the encoding you can specify to convert to, can also add two suffixes://translit and//ignor E, where//translit automatically converts characters that cannot be converted directly into one or more approximate characters,//ignore ignores characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encoding, it will automatically identify according to the content, but the execution efficiency is much worse than the iconv;
Use:
The iconv is found to have an error converting the character "-" to gb2312, and if there is no ignore parameter, all strings after that character cannot be saved. In any case, this "-" can not be converted successfully, unable to output. Another mb_convert_encoding does not have this bug.
In general, with Iconv, only use the Mb_convert_encoding function if you encounter an inability to determine what encoding the original encoding is, or if the Iconv conversion fails to display properly.
Copy CodeThe code is as follows:
/**
* Automatic judgment to convert GBK or gb2312 encoded strings to UTF8
* can automatically determine the input string encoding class, if itself is utf-8 do not convert, otherwise converted to Utf-8 string
* The supported character encoding type is: utf-8,gbk,gb2312
*@ $str: String strings
*/
function Yang_gbk2utf8 ($STR) {
$charset = Mb_detect_encoding () ($str, Array (' UTF-8 ', ' GBK ', ' GB2312 '));
$charset = Strtolower ($charset);
if (' cp936 ' = = $charset) {
$charset = ' GBK ';
}
if ("utf-8"! = $charset) {
$str = Iconv ($charset, "Utf-8//ignore", $str);
}
return $str;
}
Next I'll look at some issues in converting character encodings
Using mb_detect_encoding ($STR); function, use this function must open PHP extension=php_mbstring.dll extension
Copy CodeThe code is as follows:
<?php
$str = "Test ing";
$cha =mb_detect_encoding ($STR);
$s = Iconv ($cha, "UTF-8", $str);
Var_dump ($s);
?>
Results returned:
String (0) ""
It's strange why this is so.
Copy CodeThe code is as follows:
<?php
$str = "Test ing";
$cha =mb_detect_encoding ($STR);
$s = Iconv ("GB2312", "UTF-8", $str);
Var_dump ($s);
?>
The returned result is correct. The function is found to be mb_detect_encoding ($STR); I don't know what the reason is.
function String mb_convert_encoding (string $str, String $to _encoding [, Mixed $from _encoding])
Can be converted to the specified encoded string, I wrote an example
Copy CodeThe code is as follows:
<pre lang= "php" line= "1" >
<?php
$a = "I'm fine";
Echo mb_convert_encoding ($a, ' UTF-8 ');
?>
But the result is:
?? Lu Lu?
Now the question is, if I convert the different string encoding form to utf-8, if we know the change beforehand, we can use iconv, but what if we don't know the encoding?
Problem 3:iconv problem, if the converted string, the first byte of the encoding greater than a certain number will return null.
Such as:
Copy CodeThe code is as follows:
<?php
$str =CHR (254). " Test ing ". chr (254);
$s = Iconv ("GB2312", "UTF-8", $str);
Var_dump ($s);
?>
Return
String (0) ""
Mb_convert_encoding's usage is in the official view:
http://cn.php.net/manual/en/function.mb-convert-encoding.php
Another function in PHP, Iconv, is also used to convert string encodings, similar to functions on the upper function.
Here are a few more examples:
Iconv-convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
Mb_convert_encoding-convert character encoding
(PHP 4 >= 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encoding, it will automatically identify according to the content, but the execution efficiency is much worse than the iconv;
String Iconv (String in_charset, String out_charset, String str)
Note: The second parameter, in addition to specifying the encoding to be converted to, can also add two suffixes://translit and//ignore, where//translit automatically converts characters that cannot be converted directly into one or more approximate characters,//ignore Characters that cannot be converted are ignored, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
Use:
The iconv is found to have an error converting the character "-" to gb2312, and if there is no ignore parameter, all strings after that character cannot be saved. In any case, this "-" can not be converted successfully, unable to output. Another mb_convert_encoding does not have this bug.
In general, with Iconv, only use the Mb_convert_encoding function if you encounter an inability to determine what encoding the original encoding is, or if the Iconv conversion fails to display properly.
From_encoding is specified by character code name before conversion. It can be array or STRING–COMMA separated enumerated list. If It is not specified, the internal encoding would be used.
/* Auto detect encoding from JIS, Eucjp-win, Sjis-win, then convert str to Ucs-2le */
$str = mb_convert_encoding ($str, "Ucs-2le", "JIS, Eucjp-win, Sjis-win");
/* "Auto" is expanded to "ascii,jis,utf-8,euc-jp,sjis" */
$str = mb_convert_encoding ($str, "EUC-JP", "Auto");
Example:
Copy CodeThe code is as follows:
<?php
$content = Iconv ("GBK", "UTF-8", $content);
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>
This can be converted based on the character encoding of the input and output.
Copy CodeThe code is as follows:
<?php
function Phpcharset ($data, $to) {
if (Is_array ($data)) {
foreach ($data as $key = = $val) {
$data [$key] = Phpcharset ($val, $to);
}
} else {
$encode _array = Array (' ASCII ', ' UTF-8 ', ' GBK ', ' GB2312 ', ' BIG5 ');
$encoded = mb_detect_encoding ($data, $encode _array);
$to = Strtoupper ($to);
if ($encoded! = $to) {
$data = mb_convert_encoding ($data, $to, $encoded);
}
}
return $data;
}
?>
PHP character encoding conversion gb2312 converted to UTF8 (RPM)