Use of the encoding and conversion functions mb_convert_encoding and iconv in PHP

Source: Internet
Author: User

However, the English language generally does not have Encoding Problems. Only Chinese data can have this problem. For example, if you use Zend studio or editplus to write Program The GBK encoding is used. If the data needs to be imported into the database and the database is encoded as utf8, the data must be encoded and converted. Otherwise, the data will become garbled in the database.

For the usage of mb_convert_encoding, refer to the official website:
Http://cn.php.net/manual/zh/function.mb-convert-encoding.php

Make a GBK to UTF-8CopyCodeThe Code is as follows: <? PHP
Header ("Content-Type: text/html; charset = Utf-8 ");
Echo mb_convert_encoding (" my friends", "UTF-8", "GBK ");
?>

Another gb2312 to big5Copy codeThe Code is as follows: <? PHP
Header ("Content-Type: text/html; charset = big5 ");
Echo mb_convert_encoding ("you are my friend", "big5", "gb2312 ");
?>

However, to use the above functions, you need to install but enable mbstring extension Library first.

Another function iconv in PHP is also used to convert string encoding, similar to the function above.

The following are examples:
Iconv-convert string to requested character encoding
(PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-convert character encoding
(PHP 4> = 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (string STR, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in PHP. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;

String iconv (string in_charset, string out_charset, string Str)
Note: In addition to specifying the encoding to be converted, you can also add two suffixes: // Transcoder and // ignore, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // ignore ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or false on failure.

Usage:

It is found that iconv will encounter an error when converting characters "-" To gb2312. If the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.

Generally, iconv is used. The mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or the iconv cannot be normally displayed after conversion.

From_encoding is specified by character code name before conversion. It can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto Detect encoding from JIS, eucjp-win, sjis-win, then convert STR to UCS-2LE */
$ STR = mb_convert_encoding ($ STR, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
/* "Auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, sjis "*/
$ STR = mb_convert_encoding ($ STR, "EUC-JP", "Auto ");

Example:Copy codeCode: $ content = iconv ("GBK", "UTF-8", $ content );
$ Content = mb_convert_encoding ($ content, "UTF-8", "GBK ");

small traps in using mb_convert_encoding for transcoding in PHP
you are familiar with character encoding and conversion using the mb_convert_encoding () method in PHP, it is also widely used. In general, this method is also good enough and worthy of praise. However, in a project, we need to use it to convert utf8 to GBK, which is not a small problem when converting some special characters. The specific manifestation is that MB converts UTF-8 encoded characters and non-encoded characters in GBK to \ 0x00 \ 0x80, in this way, the GBK character after conversion is problematic.
in our consciousness, in the process of character encoding conversion, if the target encoding cannot represent any character, what the transcoding program should do is discard this character, in this way, although some data is lost, the transcoded Character Sequence will not be unavailable. It is unclear why the above method is used instead of the discard method.
the temporary solution is to filter out the transcoded string sequence, filter out all \ x00 \ 80 characters, or filter the UTF-8 string before escaping, filter out all characters that can be expressed by ut8 and not represented by GBK. In terms of implementation difficulty, the first filtering method is easier.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.