Instructions for using PHP code conversion function mb_convert_encoding and Iconv

Source: Internet
Author: User
Tags translit

Mb_convert_encoding This function is used to convert the encoding.

However, there is generally no coding problem in English, only the Chinese data will have this problem. For example, when you write a program with Zend Studio or editplus, using GBK encoding, if the data needs to enter the database, and the database encoding is UTF8, then the data will be encoded conversion, or into the database will become garbled.

Mb_convert_encoding's usage is in the official view:
http://cn.php.net/manual/zh/function.mb-convert-encoding.php

Make a GBK to UTF-8

The code is as follows:
<?php
Header ("content-type:text/html; Charset=utf-8 ");
Echo mb_convert_encoding ("You Are my Friend", "UTF-8", "GBK");
?>


One more GB2312 to Big5.

The code is as follows:
<?php
Header ("content-type:text/html; Charset=big5 ");
Echo mb_convert_encoding ("You Are my Friend", "Big5", "GB2312");
?>

However, to use the above function requires installation but first enable the mbstring extension library.

Another function in PHP, Iconv, is also used to convert string encodings, similar to functions on the upper function.

Here are a few more examples:
Iconv-convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
Mb_convert_encoding-convert character encoding
(PHP 4 >= 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encoding, it will automatically identify according to the content, but the execution efficiency is much worse than the iconv;


String Iconv (String in_charset, String out_charset, String str)
Note: The second parameter, in addition to specifying the encoding to be converted to, can also add two suffixes://translit and//ignore, where//translit automatically converts characters that cannot be converted directly into one or more approximate characters,//ignore Characters that cannot be converted are ignored, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.


Use:

The iconv is found to have an error converting the character "-" to gb2312, and if there is no ignore parameter, all strings after that character cannot be saved. In any case, this "-" can not be converted successfully, unable to output. Another mb_convert_encoding does not have this bug.

In general, with Iconv, only use the Mb_convert_encoding function if you encounter an inability to determine what encoding the original encoding is, or if the Iconv conversion fails to display properly.

From_encoding is specified by character code name before conversion. It can be array or STRING-COMMA separated enumerated list. If It is not specified, the internal encoding would be used.
/* Auto detect encoding from JIS, Eucjp-win, Sjis-win, then convert str to Ucs-2le */
$str = mb_convert_encoding ($str, "Ucs-2le", "JIS, Eucjp-win, Sjis-win");
/* "Auto" is expanded to "ascii,jis,utf-8,euc-jp,sjis" */
$str = mb_convert_encoding ($str, "EUC-JP", "Auto");

Example:

The code is as follows:
$content = Iconv ("GBK", "UTF-8", $content);
$content = mb_convert_encoding ($content, "UTF-8", "GBK");


Small traps using mb_convert_encoding transcoding in PHP
Using the Mb_convert_encoding () method in a PHP program to convert character encoding is very familiar to everyone, usually also in a lot of use. And in general, the method is good enough to be praised. But in a project we need to use it for UTF8 to GBK conversion, and when converting some special characters, we find a minor problem. Specifically, MB converts characters that are UTF8 encoded in the GBK to \0x00\0x80, which causes the converted GBK character to have a problem.
In our consciousness, in the process of character encoding conversion, if you encounter a character that is not represented by the target encoding, the transcoding program should discard the character, so that although some data is lost, it will not cause the transcoding character sequence to be unavailable. It is unclear why the MB should use the above method instead of the discard method.
The temporary solution is to filter the string sequence after transcoding, filter out all \x00\80 characters, or to filter out the UTF8 string before escaping, filter out all the characters that UT8 can represent and GBK not represent, the first filtering method is easier to achieve in terms of difficulty.

Instructions for using code conversion functions mb_convert_encoding and iconv under PHP

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.