PHP character encoding conversion gb2312 to utf8_php instance

Source: Internet
Author: User
Tags chr mixed translit

In the PHP character encoding conversion we generally use ICONV and mb_convert_encoding to operate, but mb_convert_encoding in the conversion performance than iconv much worse oh.
String Iconv (String in_charset, String out_charset, String str) Note: The second parameter, in addition to specifying the encoding to be converted, can add two suffixes://translit and//ignor E, where//translit automatically converts characters that cannot be directly converted into one or more approximate characters,//ignore ignores characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in the php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encodings, which are automatically recognized based on content, but perform much less efficiently than iconv;

Use:
Iconv found that there was an error in converting the character "-" to gb2312, and if there were no ignore arguments, all the strings after that character could not be saved. In any case, this "-" cannot be converted successfully and cannot be exported. In addition Mb_convert_encoding does not have this bug.
In general, the Mb_convert_encoding function is used only when the ICONV is encountered that cannot determine what encoding the original encoding is, or if the iconv is not displayed properly after conversion.

Copy Code code as follows:

/**
* Automatically judge to convert GBK or gb2312 encoded strings into UTF8
* Can automatically judge the input string encoding class, if itself is utf-8 without conversion, otherwise converted to Utf-8 string
* The supported character encoding type is: utf-8,gbk,gb2312
*@ $str: String strings
*/
function Yang_gbk2utf8 ($STR) {
$charset = mb_detect_encoding ($str, Array (' UTF-8 ', ' GBK ', ' GB2312 '));
$charset = Strtolower ($charset);
if (' cp936 ' = $charset) {
$charset = ' GBK ';
}
if ("Utf-8"!= $charset) {
$str = Iconv ($charset, "Utf-8//ignore", $str);
}
return $str;
}



Now I'll look at some of the problems in converting character encodings


Use the mb_detect_encoding ($STR) function, which must open the extension=php_mbstring.dll extension of PHP


Copy Code code as follows:

<?php
$str = "Test ing";
$cha =mb_detect_encoding ($STR);
$s = Iconv ($cha, "UTF-8", $str);
Var_dump ($s);
?>



The result returns:


String (0) ""


It's strange why this is so.


Copy Code code as follows:

<?php
$str = "Test ing";
$cha =mb_detect_encoding ($STR);
$s = Iconv ("GB2312", "UTF-8", $str);
Var_dump ($s);
?>



Returns the correct result. The function mb_detect_encoding ($STR) was found, and the judgment was not accurate. I don't know what the reason is.


function String mb_convert_encoding (string $str, String $to _encoding [, Mixed $from _encoding])


Can be converted to the specified encoded string, I wrote an example


Copy Code code as follows:

<pre lang= "php" line= "1" >
<?php
$a = "I'm fine";
Echo mb_convert_encoding ($a, ' UTF-8 ');
?>



But the result is:


?? Lu Lu?


Now the question is, if I convert the different string encoding form to utf-8, if I know the change in advance, I can use iconv, but what if I don't know the code?


Problem 3:iconv problem, if the converted string, the first byte encoding greater than a certain number will return null.


Such as:


Copy Code code as follows:

<?php
$str =CHR (254). " Test ing ". chr (254);
$s = Iconv ("GB2312", "UTF-8", $str);
Var_dump ($s);
?>



Return


String (0) ""

Mb_convert_encoding's usage See official:

http://cn.php.net/manual/en/function.mb-convert-encoding.php

Another function in PHP, Iconv, is also used to convert string encodings, similar to the functions on functions.

Here are a few more examples:


Iconv-convert string to requested character encoding


(PHP 4 &gt;= 4.0.5, PHP 5)


Mb_convert_encoding-convert character encoding


(PHP 4 &gt;= 4.0.6, PHP 5)


Usage:


String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])


Need to enable Mbstring expansion Library, in the php.ini; Extension=php_mbstring.dll in front of; Remove


Mb_convert_encoding can specify a variety of input encodings, which are automatically recognized based on content, but perform much less efficiently than iconv;


String Iconv (String in_charset, String out_charset, String str)


Note: The second parameter, in addition to specifying the encoding to be converted, can add two suffixes://translit and//ignore, where//translit automatically converts characters that cannot be directly converted into one or more approximate characters,//ignore Ignores characters that cannot be converted, and the default effect is to truncate from the first illegal character.


Returns the converted string or FALSE on failure.


Use:


Iconv found that there was an error in converting the character "-" to gb2312, and if there were no ignore arguments, all the strings after that character could not be saved. In any case, this "-" cannot be converted successfully and cannot be exported. In addition Mb_convert_encoding does not have this bug.


In general, the Mb_convert_encoding function is used only when the ICONV is encountered that cannot determine what encoding the original encoding is, or if the iconv is not displayed properly after conversion.


From_encoding is specified by character code name before conversion. It can be an array or STRING–COMMA separated enumerated list. If It is not specified, the internal encoding would be used.


/* Auto detect encoding from JIS, Eucjp-win, Sjis-win, then convert str to UCS-2LE * *


$str = mb_convert_encoding ($str, "Ucs-2le", "JIS, Eucjp-win, Sjis-win");


/* "Auto" is expanded to "ascii,jis,utf-8,euc-jp,sjis" * *


$str = mb_convert_encoding ($str, "EUC-JP", "Auto");


Example:


Copy Code code as follows:

<?php
$content = Iconv ("GBK", "UTF-8", $content);
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>



This can be converted based on the character encoding of the input and output


Copy Code code as follows:

<?php
function Phpcharset ($data, $to) {
if (Is_array ($data)) {
foreach ($data as $key => $val) {
$data [$key] = Phpcharset ($val, $to);
}
} else {
$encode _array = Array (' ASCII ', ' UTF-8 ', ' GBK ', ' GB2312 ', ' BIG5 ');
$encoded = mb_detect_encoding ($data, $encode _array);
$to = Strtoupper ($to);
if ($encoded!= $to) {
$data = mb_convert_encoding ($data, $to, $encoded);
}
}
return $data;
}
?>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.