Converting php character encoding to gb2312 to utf8

Source: Internet
Author: User
Tags chr mixed truncated

String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.

String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;

Usage:

It is found that iconv will encounter an error when converting characters "-" to gb2312. If the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.

Generally, iconv is used. The mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.

 

The code is as follows: Copy code

/**
* Automatically converts gbk or gb2312 encoded strings into utf8
* The encoding class of the input string can be automatically determined. If the encoding class is UTF-8, no conversion is required. Otherwise, the conversion is a UTF-8 string.
* Supported character encoding types: UTF-8, gbk, and gb2312
* @ $ Str: string
*/
Function yang_gbk2utf8 ($ str ){
$ Charset = mb_detect_encoding () ($ str, array ('utf-8', 'gbk', 'gb2312 '));
$ Charset = strtolower ($ charset );
If ('cp936' ==$ charset ){
$ Charset = 'gbk ';
    }
If ("UTF-8 "! = $ Charset ){
$ Str = iconv ($ charset, "UTF-8 // IGNORE", $ str );
    }
Return $ str;
}

Next I will look at some problems with converting character encoding.

Use the mb_detect_encoding ($ str); function. To use this function, you must open the extension = php_mbstring.dll extension of php.

The code is as follows: Copy code
<? Php
$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
Echo $ cha;
?>

I entered on the gb2312 page, but the output result is a strange UTF-8 and I haven't found the reason.

I want to convert to UTF-8 encoding in a unified way, using the following method

The code is as follows: Copy code
<? Php
$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
$ S = iconv ($ cha, "UTF-8", $ str );
Var_dump ($ s );
?>

Result returned:
String (0) ""
That's strange. Why.
Use

The code is as follows: Copy code
<? Php
$ Str = "test ing ";
$ Cha = mb_detect_encoding ($ str );
$ S = iconv ("GB2312", "UTF-8", $ str );
Var_dump ($ s );
?>

 
The returned result is correct. The mb_detect_encoding ($ str) function is found to be inaccurate. I don't know why.
Function string mb_convert_encoding (string $ str, string $ to_encoding [, mixed $ from_encoding])
 
Can be converted to a specified encoded string. I wrote an example.

The code is as follows: Copy code
<Pre lang = "php" line = "1">
<? Php
$ A = "I'm fine ";
Echo mb_convert_encoding ($ a, 'utf-8 ');
?>

The result is:
?? Why? Why?
The problem now is that if I convert different string encoding formats to UTF-8 in a unified manner, can I use iconv if I know the change beforehand, but what should I do if I do not know the encoding?

Question 3: iconv: If the encoding of the first byte of the converted string is greater than a certain number, null is returned.

For example:

The code is as follows: Copy code
<? Php
$ Str = chr (254). "test ing". chr (254 );
$ S = iconv ("GB2312", "UTF-8", $ str );
Var_dump ($ s );
?>

Return
String (0) ""

For the usage of mb_convert_encoding, refer to the official website:

Http://cn.php.net/manual/en/function.mb-convert-encoding.php

Another function iconv in PHP is also used to convert string encoding, similar to the function above.

The following are examples:
Iconv-Convert string to requested character encoding
(PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-Convert character encoding
(PHP 4> = 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;

String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.

Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. If the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. The mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.

From_encoding is specified by character code name before conversion. it can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$ Str = mb_convert_encoding ($ str, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
/* "Auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS "*/
$ Str = mb_convert_encoding ($ str, "EUC-JP", "auto ");

Example:

The code is as follows: Copy code

<? Php
$ Content = iconv ("GBK", "UTF-8", $ content );
$ Content = mb_convert_encoding ($ content, "UTF-8", "GBK ");
?>

Example

This can be converted based on the character encoding of the input and output.

The code is as follows: Copy code

<? Php
Function phpcharset ($ data, $ ){
If (is_array ($ data )){
Foreach ($ data as $ key => $ val ){
$ Data [$ key] = phpcharset ($ val, $ );
  }
} Else {
$ Encode_array = array ('ascii ', 'utf-8', 'gbk', 'gb2312', 'big5 ');
$ Encoded = mb_detect_encoding ($ data, $ encode_array );
$ To = strtoupper ($ );
If ($ encoded! = $ ){
$ Data = mb_convert_encoding ($ data, $ to, $ encoded );
  }
 }
Return $ data;
}
?>

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.