How to use iconv functions in php. The iconv function library can convert character sets. it is an indispensable basic function library in php programming. 1. download libiconv function library ftp. gnu. orgpubgnulibiconvlibiconv-1.9. iconv function library can complete the conversion between various character sets, is an indispensable basic function library in php programming.
1. download the libiconv function library http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.9.2.tar.gz;
2. decompress tar-zxvf libiconv-1.9.2.tar.gz;
3. install libiconv
# Configure -- prefix =/usr/local/iconv
# Make
# Make install
4. re-compile php to add the compilation parameter -- with-iconv =/usr/local/iconv
Windows
Recently, I am working on a thief program. I need to use the iconv function to convert the captured UTF-8 encoded page to gb2312, it is found that only the iconv function can be used to convert the captured data into less data for no reason. I was depressed for a while. I checked the information on the Internet to find out that this is a bug in the iconv function. Iconv will encounter an error when converting the character "-" to gb2312
The solution is simple. add "// IGNORE" after the code to be converted, that is, the second parameter of the iconv function, as follows:
Reference content is as follows:
The code is as follows:
Iconv ("UTF-8", "GB2312 // IGNORE", $ data)
Ignore indicates that the conversion error is ignored. without the ignore parameter, all strings after this character cannot be saved.
The code is as follows:
Echo $ str = 'Hi, it's coffee sale! ';
Echo'
';
Echo iconv ('gb2312', 'utf-8', $ str); // Encode the string from GB2312 to UTF-8
Echo'
';
Echo iconv_substr ($ str, 1, 1, 'utf-8'); // truncate by number of characters rather than bytes
Print_r (iconv_get_encoding (); // Obtain the encoding information of the current page.
Echo iconv_strlen ($ str, 'utf-8'); // you can specify the length of the encoded string.
// This is also applicable.
$ Content = iconv ("UTF-8", "gbk // transcoder", $ content );
?>
Iconv is not the default function of php, but also the default installed module. Installation is required.
For Windows + php, you can modify php. ini file, remove ";" Before extension = php_iconv.dll, and copy the iconv under your original php installation file. dll to your winnt/system32 (if your dll points to this directory)
In linux, you can add one more item -- with-iconv to configure by using static installation. phpinfo shows the iconv items. (Linux7.3 + Apache4.06 + php4.3.2 ),
Download: ftp://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.8.tar.gz
Installation:
# Cp libiconv-1.8.tar.gz/usr/local/src
# Tar zxvf lib *
#./Configure -- prefix =/usr/local/libiconv
# Make
# Make install
Compile php
#./Configure -- prefix =/usr/local/php4.3.2 -- with-iconv =/usr/local/libiconv/
Simple example:
Echo iconv ("gb2312", "ISO-8859-1", "we ");
?>
Introduction to mb_convert_encoding and iconv functions in PHP
Mb_convert_encoding is used to convert the encoding. I have never understood the concept of program encoding, but now it seems a bit open.
However, the English language generally does not have encoding problems. only Chinese data can have this problem. For example, when you use Zend Studio or Editplus to write a program, you use gbk encoding. if the data needs to be imported into the database and the database is encoded as utf8, you need to encode and convert the data, otherwise, the database will become garbled.
For the usage of mb_convert_encoding, refer to the official website:
Http://cn.php.net/manual/zh/function.mb-convert-encoding.php
Make a GBK To UTF-8
<? Php
Header ("content-Type: text/html; charset = Utf-8 ");
Echo mb_convert_encoding (" my friends", "UTF-8", "GBK ");
?>
Another GB2312 To Big5
<? Php
Header ("content-Type: text/html; charset = big5 ");
Echo mb_convert_encoding ("You are my friend", "big5", "GB2312 ");
?>
However, to use the above functions, you need to install but enable mbstring Extension Library first.
Another function iconv in PHP is also used to convert string encoding, similar to the function above.
The following are examples:
Iconv-Convert string to requested character encoding
(PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-Convert character encoding
(PHP 4> = 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable mbstring extension Library first, and remove the extension library before extension = php_mbstring.dll in php. ini.
Mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv;
String iconv (string in_charset, string out_charset, string str)
Note: in addition to specifying the encoding to be converted, you can also add two suffixes: // transcoder and // IGNORE, // Transcoder automatically converts a character that cannot be directly converted into one or more similar characters. // IGNORE ignores the characters that cannot be converted, by default, the result is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
Usage:
It is found that iconv will encounter an error when converting characters "-" to gb2312. if the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
Generally, iconv is used. the mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or The iconv cannot be normally displayed after conversion.
From_encoding is specified by character code name before conversion. it can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$ Str = mb_convert_encoding ($ str, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
/* "Auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS "*/
$ Str = mb_convert_encoding ($ str, "EUC-JP", "auto ");
Example:
$ Content = iconv ("GBK", "UTF-8", $ content );
$ Content = mb_convert_encoding ($ content, "UTF-8", "GBK ");
Parameters that are easy to ignore when using the iconv function in php
When processing captured content today, when iconv is used for encoding conversion, it is found that the results will be interrupted and the guess is a character set problem. how can we skip the characters that do not exist in the target character set, I found in the manual that the iconv function has only three parameters, but it does not seem to work. someone on the Internet said yes, but it is strange how to implement it, finally, I found that the English description can be marked after the target code: "transcoder". how can I add it to my disappointment? It turned out that "//" was first added. it was really depressing and there was such a design.
Prototype: $ txtContent = iconv ("UTF-8", 'gbk', $ txtContent );
Special parameters: iconv ("UTF-8", "GB2312 // IGNORE", $ data)
Two optional auxiliary parameters: transcoder and IGNORE ). Description
String iconv (string in_charset, string out_charset, string str)
Performs a character set conversion on the string str from in_charset to out_charset. Returns the converted string or FALSE on failure.
If you append the string // transcoder to out_charset transliteration is activated. this means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. if you append the string // IGNORE, characters that cannot be represented in the target charset are silently discarded. otherwise, str is cut from the first illegal character.
Bytes. 1, download libiconv function library http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.9 ....