It was used in the beginning.
$ Str = iconv ('utf-8', 'gb2312', unescape (isset ($ _ GET ['str'])? $ _ GET ['str']: '');
After going online, a bunch of such errors are reported: iconv (): Detected an illegal character in input string
Considering that the GB2312 character set is relatively small, change it to a larger one, so change it to GBK:
$ Str = iconv ('utf-8', 'gbk', unescape (isset ($ _ GET ['str'])? $ _ GET ['str']: '');
The same error is reported after going online!
Read the manual carefully and find that there is such a paragraph:
If you append the string // Transcoder to out_charset transliteration is activated. this means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. if you append the string // IGNORE, characters that cannot be represented in the target charset are silently discarded. otherwise, str is cut from the first illegal character.
Changed:
$ Str = iconv ('utf-8', 'gbk // IGNORE ', unescape (isset ($ _ GET ['str'])? $ _ GET ['str']: '');
In the local test, // IGNORE can IGNORE the words it does not know and then turn them down without an error. // Transcoder intercepts the words it does not know and the subsequent content, and reports an error. // IGNORE is what I need.
Now wait for the launch to see the results (this is not a good practice, continue to ponder the manual, search for it online), haha...
Find the following article on the Internet and find that mb_convert_encoding is acceptable, but the efficiency is worse than iconv.
Differences between iconv and mb_convert_encoding
Iconv-Convert string to requested character encoding (PHP 4> = 4.0.5, PHP 5)
Mb_convert_encoding-Convert character encoding (PHP 4> = 4.0.6, PHP 5)
Usage:
String mb_convert_encoding (string str, string to_encoding [, mixed from_encoding])
Enable the mbstring extension Library first. in php. ini, remove the extension before php_mbstring.dll.
String iconv (string in_charset, string out_charset, string str)
Note:
The second parameter, in addition to specifying the encoding to be converted, can also add two suffixes: // Transcoder and // IGNORE,
Where:
// Transcoder automatically converts a character that cannot be directly converted into one or more similar characters,
// IGNORE ignores the characters that cannot be converted. By default, it is truncated from the first invalid character.
Returns the converted string or FALSE on failure.
Usage:
1. It is found that iconv will encounter an error when converting the character "-" To gb2312. If the ignore parameter is not available, all strings after this character cannot be saved. In any case, the "-" cannot be converted successfully or output. In addition, mb_convert_encoding does not have this bug.
2. mb_convert_encoding can specify multiple input encodings, which are automatically identified based on the content, but the execution efficiency is much lower than that of iconv. For example: $ str = mb_convert_encoding ($ str, "euc-jp ", "ASCII, JIS, EUC-JP, SJIS, UTF-8"); the order of "ASCII, JIS, EUC-JP, SJIS, UTF-8" is different.
3. Generally, iconv is used. The mb_convert_encoding function is used only when the encoding of the original encoding cannot be determined or the iconv cannot be displayed normally after conversion.
From_encoding is specified by character code name before conversion. it can be array or string-comma separated enumerated list. If it is not specified, the internal encoding will be used.
$ Str = mb_convert_encoding ($ str, "UCS-2LE", "JIS, eucjp-win, sjis-win ");
$ Str = mb_convert_encoding ($ str, "EUC-JP", "auto ");
Example:
$ Content = iconv ("GBK", "UTF-8", $ content );
$ Content = mb_convert_encoding ($ content, "UTF-8", "GBK ");