PHP character encoding conversion gb2312 converted to UTF8

Source: Internet
Author: User
Tags translit

In PHP character encoding conversion We generally use the iconv and mb_convert_encoding to operate, but mb_convert_encoding in the conversion performance is much worse than iconv oh.
String Iconv (String in_charset, String out_charset, String str) Note: The second parameter, in addition to the encoding you can specify to convert to, can also add two suffixes://translit and//ignor E, where//translit automatically converts characters that cannot be converted directly into one or more approximate characters,//ignore ignores characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encoding, it will automatically identify according to the content, but the execution efficiency is much worse than the iconv;

Use:
The iconv is found to have an error converting the character "-" to gb2312, and if there is no ignore parameter, all strings after that character cannot be saved. In any case, this "-" can not be converted successfully, unable to output. Another mb_convert_encoding does not have this bug.
In general, with Iconv, only use the Mb_convert_encoding function if you encounter an inability to determine what encoding the original encoding is, or if the Iconv conversion fails to display properly.

Copy CodeThe code is as follows:
/**
* Automatic judgment to convert GBK or gb2312 encoded strings to UTF8
* can automatically determine the input string encoding class, if itself is utf-8 do not convert, otherwise converted to Utf-8 string
* The supported character encoding type is: utf-8,gbk,gb2312
*@ $str: String strings
*/
function Yang_gbk2utf8 ($STR) {
$charset = Mb_detect_encoding () ($str, Array (' UTF-8 ', ' GBK ', ' GB2312 '));
$charset = Strtolower ($charset);
if (' cp936 ' = = $charset) {
$charset = ' GBK ';
}
if ("utf-8"! = $charset) {
$str = Iconv ($charset, "Utf-8//ignore", $str);
}
return $str;
}


Next I'll look at some issues in converting character encodings
Using mb_detect_encoding ($STR); function, use this function must open PHP extension=php_mbstring.dll extension

Copy CodeThe code is as follows:
<?php
$str = "Test ing";
$cha =mb_detect_encoding ($STR);
$s = Iconv ($cha, "UTF-8", $str);
Var_dump ($s);
?>


Results returned:
String (0) ""
It's strange why this is so.

Copy CodeThe code is as follows:
<?php
$str = "Test ing";
$cha =mb_detect_encoding ($STR);
$s = Iconv ("GB2312", "UTF-8", $str);
Var_dump ($s);
?>


The returned result is correct. The function is found to be mb_detect_encoding ($STR); I don't know what the reason is.
function String mb_convert_encoding (string $str, String $to _encoding [, Mixed $from _encoding])
Can be converted to the specified encoded string, I wrote an example

Copy CodeThe code is as follows:
<pre lang= "php" line= "1" >
<?php
$a = "I'm fine";
Echo mb_convert_encoding ($a, ' UTF-8 ');
?>


But the result is:
?? Lu Lu?
Now the question is, if I convert the different string encoding form to utf-8, if we know the change beforehand, we can use iconv, but what if we don't know the encoding?
Problem 3:iconv problem, if the converted string, the first byte of the encoding greater than a certain number will return null.
Such as:

Copy CodeThe code is as follows:
<?php
$str =CHR (254). " Test ing ". chr (254);
$s = Iconv ("GB2312", "UTF-8", $str);
Var_dump ($s);
?>


Return
String (0) ""

Mb_convert_encoding's usage is in the official view:

http://cn.php.net/manual/en/function.mb-convert-encoding.php

Another function in PHP, Iconv, is also used to convert string encodings, similar to functions on the upper function.

Here are a few more examples:
Iconv-convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
Mb_convert_encoding-convert character encoding
(PHP 4 >= 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encoding, it will automatically identify according to the content, but the execution efficiency is much worse than the iconv;
String Iconv (String in_charset, String out_charset, String str)
Note: The second parameter, in addition to specifying the encoding to be converted to, can also add two suffixes://translit and//ignore, where//translit automatically converts characters that cannot be converted directly into one or more approximate characters,//ignore Characters that cannot be converted are ignored, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
Use:
The iconv is found to have an error converting the character "-" to gb2312, and if there is no ignore parameter, all strings after that character cannot be saved. In any case, this "-" can not be converted successfully, unable to output. Another mb_convert_encoding does not have this bug.
In general, with Iconv, only use the Mb_convert_encoding function if you encounter an inability to determine what encoding the original encoding is, or if the Iconv conversion fails to display properly.
From_encoding is specified by character code name before conversion. It can be array or STRING–COMMA separated enumerated list. If It is not specified, the internal encoding would be used.
/* Auto detect encoding from JIS, Eucjp-win, Sjis-win, then convert str to Ucs-2le */
$str = mb_convert_encoding ($str, "Ucs-2le", "JIS, Eucjp-win, Sjis-win");
/* "Auto" is expanded to "ascii,jis,utf-8,euc-jp,sjis" */
$str = mb_convert_encoding ($str, "EUC-JP", "Auto");
Example:

Copy CodeThe code is as follows:
<?php
$content = Iconv ("GBK", "UTF-8", $content);
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>


This can be converted based on the character encoding of the input and output.

Copy CodeThe code is as follows:
<?php
function Phpcharset ($data, $to) {
if (Is_array ($data)) {
foreach ($data as $key = = $val) {
$data [$key] = Phpcharset ($val, $to);
}
} else {
$encode _array = Array (' ASCII ', ' UTF-8 ', ' GBK ', ' GB2312 ', ' BIG5 ');
$encoded = mb_detect_encoding ($data, $encode _array);
$to = Strtoupper ($to);
if ($encoded! = $to) {
$data = mb_convert_encoding ($data, $to, $encoded);
}
}
return $data;
}
?>

PHP character encoding conversion gb2312 converted to UTF8 (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.