Instructions for using PHP code conversion function mb_convert

Instructions for using PHP code conversion function mb_convert_encoding and Iconv

Last Update:2016-06-08 Source: Internet

Author: User

Tags translit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mb_convert_encoding This function is used to convert the encoding.

However, there is generally no coding problem in English, only the Chinese data will have this problem. For example, when you write a program with Zend Studio or editplus, using GBK encoding, if the data needs to enter the database, and the database encoding is UTF8, then the data will be encoded conversion, or into the database will become garbled.

Mb_convert_encoding's usage is in the official view:
http://cn.php.net/manual/zh/function.mb-convert-encoding.php

Make a GBK to UTF-8

The code is as follows:
<?php
Header ("content-type:text/html; Charset=utf-8 ");
Echo mb_convert_encoding ("You Are my Friend", "UTF-8", "GBK");
?>

One more GB2312 to Big5.

The code is as follows:
<?php
Header ("content-type:text/html; Charset=big5 ");
Echo mb_convert_encoding ("You Are my Friend", "Big5", "GB2312");
?>

However, to use the above function requires installation but first enable the mbstring extension library.

Another function in PHP, Iconv, is also used to convert string encodings, similar to functions on the upper function.

Here are a few more examples:
Iconv-convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
Mb_convert_encoding-convert character encoding
(PHP 4 >= 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (String str, string to_encoding [, mixed from_encoding])
Need to enable Mbstring expansion Library, in php.ini; Extension=php_mbstring.dll in front of; Remove
Mb_convert_encoding can specify a variety of input encoding, it will automatically identify according to the content, but the execution efficiency is much worse than the iconv;

String Iconv (String in_charset, String out_charset, String str)
Note: The second parameter, in addition to specifying the encoding to be converted to, can also add two suffixes://translit and//ignore, where//translit automatically converts characters that cannot be converted directly into one or more approximate characters,//ignore Characters that cannot be converted are ignored, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.

Use:

The iconv is found to have an error converting the character "-" to gb2312, and if there is no ignore parameter, all strings after that character cannot be saved. In any case, this "-" can not be converted successfully, unable to output. Another mb_convert_encoding does not have this bug.

In general, with Iconv, only use the Mb_convert_encoding function if you encounter an inability to determine what encoding the original encoding is, or if the Iconv conversion fails to display properly.

From_encoding is specified by character code name before conversion. It can be array or STRING-COMMA separated enumerated list. If It is not specified, the internal encoding would be used.
/* Auto detect encoding from JIS, Eucjp-win, Sjis-win, then convert str to Ucs-2le */
$str = mb_convert_encoding ($str, "Ucs-2le", "JIS, Eucjp-win, Sjis-win");
/* "Auto" is expanded to "ascii,jis,utf-8,euc-jp,sjis" */
$str = mb_convert_encoding ($str, "EUC-JP", "Auto");

Example:

The code is as follows:
$content = Iconv ("GBK", "UTF-8", $content);
$content = mb_convert_encoding ($content, "UTF-8", "GBK");

Small traps using mb_convert_encoding transcoding in PHP
Using the Mb_convert_encoding () method in a PHP program to convert character encoding is very familiar to everyone, usually also in a lot of use. And in general, the method is good enough to be praised. But in a project we need to use it for UTF8 to GBK conversion, and when converting some special characters, we find a minor problem. Specifically, MB converts characters that are UTF8 encoded in the GBK to \0x00\0x80, which causes the converted GBK character to have a problem.
In our consciousness, in the process of character encoding conversion, if you encounter a character that is not represented by the target encoding, the transcoding program should discard the character, so that although some data is lost, it will not cause the transcoding character sequence to be unavailable. It is unclear why the MB should use the above method instead of the discard method.
The temporary solution is to filter the string sequence after transcoding, filter out all \x00\80 characters, or to filter out the UTF8 string before escaping, filter out all the characters that UT8 can represent and GBK not represent, the first filtering method is easier to achieve in terms of difficulty.

Instructions for using code conversion functions mb_convert_encoding and iconv under PHP

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More