Converting php character encoding to gb2312 to utf8_php

Source: Internet
Author: User
PHP output pages are often garbled. what should I do? Today, I provide a method to automatically determine the php character encoding and convert the string encoded by gbk or gb2312 to utf8. In php, we usually use iconv and mb_convert_encoding for character encoding conversion. However, mb_convert_encoding has much worse conversion performance than iconv.
String iconv (string in_charset, string out_charset, string str) Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE. // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters, and // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;

Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.

The code is as follows:


/**
* Automatically converts gbk or gb2312 encoded strings into utf8
* The encoding class of the input string can be automatically determined. if the encoding class is UTF-8, no conversion is required. Otherwise, the conversion is a UTF-8 string.
* Supported character encoding types: UTF-8, gbk, and gb2312
* @ $ Str: string
*/
Function yang_gbk2utf8 ($ str ){
$ Charset = mb_detect_encoding () ($ str, array ('utf-8', 'gbk', 'gb2312 '));
$ Charset = strtolower ($ charset );
If ('cp936' ==$ charset ){
$ Charset = 'gbk ';
}
If ("UTF-8 "! = $ Charset ){
$ Str = iconv ($ charset, "UTF-8 // IGNORE", $ str );
}
Return $ str;
}


Next I will look at some problems with converting character encoding.
Use the mb_detect_encoding ($ str); function. to use this function, you must open the extension = php_mbstring.dll extension of php.

The code is as follows:


$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
$ S = iconv ($ cha, "UTF-8", $ str );
Var_dump ($ s );
?>


Result returned:
String (0) ""
That's strange. why.

The code is as follows:


$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
$ S = iconv ("GB2312", "UTF-8", $ str );
Var_dump ($ s );
?>


The returned result is correct. The mb_detect_encoding ($ str) function is found to be inaccurate. I don't know why.
Function string mb_convert_encoding (string $ str, string $ to_encoding [, mixed $ from_encoding])
Can be converted to a specified encoded string. I wrote an example.

The code is as follows:



$ A = "I'm fine ";
Echo mb_convert_encoding ($ a, 'utf-8 ');
?>


The result is:
?? Why? Why?
The problem now is that if I convert different string encoding formats to UTF-8 in a unified manner, can I use iconv if I know the change beforehand, but what should I do if I do not know the encoding?
Question 3: iconv: If the encoding of the first byte of the converted string is greater than a certain number, null is returned.
For example:

The code is as follows:


$ Str = chr (254). "test ing". chr (254 );
$ S = iconv ("GB2312", "UTF-8", $ str );
Var_dump ($ s );
?>


Return
String (0) ""

For the usage of mb_convert_encoding, refer to the official website:

Http://cn.php.net/manual/en/function.mb-convert-encoding.php

Another function iconv in PHP is also used to convert string encoding, similar to the function above.

The following are examples:
Iconv-Convert string to requested character encoding
(PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-Convert character encoding
(PHP 4> = 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;
String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.
From_encoding is specified by character code name before conversion. it can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$ Str = mb_convert_encoding ($ str, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
/* "Auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS "*/
$ Str = mb_convert_encoding ($ str, "EUC-JP", "auto ");
Example:

The code is as follows:


$ Content = iconv ("GBK", "UTF-8", $ content );
$ Content = mb_convert_encoding ($ content, "UTF-8", "GBK ");
?>


This can be converted based on the character encoding of the input and output.

The code is as follows:


Function phpcharset ($ data, $ ){
If (is_array ($ data )){
Foreach ($ data as $ key => $ val ){
$ Data [$ key] = phpcharset ($ val, $ );
}
} Else {
$ Encode_array = array ('ascii ', 'utf-8', 'gbk', 'gb2312', 'big5 ');
$ Encoded = mb_detect_encoding ($ data, $ encode_array );
$ To = strtoupper ($ );
If ($ encoded! = $ ){
$ Data = mb_convert_encoding ($ data, $ to, $ encoded );
}
}
Return $ data;
}
?>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.