PHP Code conversion function mb_convert_encoding and iconv of the use of instructions _php skills

Source: Internet
Author: User
Tags translit
But English does not usually have coding problems, only Chinese data will have this problem. For example, when you use Zend Studio or EditPlus to write the program, using the GBK code, if the data needs to enter the database, and the database code for UTF8, then the data will be encoded conversion, or into the database would become garbled.

Mb_convert_encoding's usage See official:
http://cn.php.net/manual/zh/function.mb-convert-encoding.php

Make a GBK to UTF-8
Copy Code code as follows:

<?php
Header ("content-type:text/html; Charset=utf-8 ");
Echo mb_convert_encoding ("You Are my Friend", "UTF-8", "GBK");
?>

And a GB2312 to Big5.
Copy Code code as follows:

<?php
Header ("content-type:text/html; Charset=big5 ");
Echo mb_convert_encoding ("You Are my Friend", "Big5", "GB2312");
?>
However, you need to use the above function to install but you need to enable the Mbstring extension library first.

Another function in PHP, Iconv, is also used to convert string encodings, similar to the functions on functions.

Here are a few more examples:
Iconv-convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
Mb_convert_encoding-convert character encoding
(PHP 4 >= 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in the php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encodings, which are automatically recognized based on content, but perform much less efficiently than iconv;


String Iconv (String in_charset, String out_charset, String str)
Note: The second parameter, in addition to specifying the encoding to be converted, can add two suffixes://translit and//ignore, where//translit automatically converts characters that cannot be directly converted into one or more approximate characters,//ignore Ignores characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.


Use:

Iconv found that there was an error in converting the character "-" to gb2312, and if there were no ignore arguments, all the strings after that character could not be saved. In any case, this "-" cannot be converted successfully and cannot be exported. In addition Mb_convert_encoding does not have this bug.

In general, the Mb_convert_encoding function is used only when the ICONV is encountered that cannot determine what encoding the original encoding is, or if the iconv is not displayed properly after conversion.

From_encoding is specified by character code name before conversion. It can be an array or STRING-COMMA separated enumerated list. If It is not specified, the internal encoding would be used.
/* Auto detect encoding from JIS, Eucjp-win, Sjis-win, then convert str to UCS-2LE * *
$str = mb_convert_encoding ($str, "Ucs-2le", "JIS, Eucjp-win, Sjis-win");
/* "Auto" is expanded to "ascii,jis,utf-8,euc-jp,sjis" * *
$str = mb_convert_encoding ($str, "EUC-JP", "Auto");

Example:
Copy Code code as follows:

$content = Iconv ("GBK", "UTF-8", $content);
$content = mb_convert_encoding ($content, "UTF-8", "GBK");

small traps for using mb_convert_encoding transcoding in PHP
Using the Mb_convert_encoding () method for character encoding in a PHP program everyone is very familiar with, usually also in a large number of uses. And in general the method is good enough to be praised. But in a project we need to use it for UTF8 to GBK transformations, and a minor problem is found when converting some special characters. It is shown that MB converts characters that are not encoded in the UTF8 in the GBK to \0x00\0x80, which causes the converted GBK characters to be problematic.
In our mind, in the process of character encoding conversion, if you encounter the target code can not be displayed characters, the transcoding program should do is to discard this character, so that although some of the data lost, but does not cause the transcoding character sequence is not available. It is not clear why MB should use the above method rather than discard it.
The temporary solution is to filter the string sequence after the transcoding, filtering out all \x00\80 characters, or filtering UTF8 strings before escaping, and filtering out all the characters that UT8 can represent and GBK, from the difficulty of implementation, the first filtering method is relatively easy to do.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.