When using the mb_detect_encoding function in php for encoding recognition, many people have encountered the problem of incorrect recognition encoding, such, or UTF-8 and GBK (here mainly for the cp936 Judgment), said online due to short character is, mb_detect_encoding will be wrong.
For example:
Copy codeThe Code is as follows:
$ Encode = mb_detect_encoding ($ keytitle, array ("ASCII", 'utf-8', "gb2312'," GBK ", 'big5 ′));
If ($ encode = "UTF-8 ″){
$ Keytitle = iconv ("UTF-8", "GBK", $ keytitle );
}
The purpose of this Code is to check whether the encoding of the string is UTF-8, if yes, it is converted to GBK.
However, when $ keytitle = "% D0 % BE % C6 % AC. The detection results are UTF-8. This bug is not actually a bug, when writing the program should not be too dependent on mb_detect_encoding, when the string is short, the possibility of deviation of the detection results is very large.
How can this problem be solved? My solution is:
Copy codeThe Code is as follows:
$ Encode = mb_detect_encoding ($ keytitle, array ('ascii ', 'gb2312', 'gbk', 'utf-8 ');
The three parameters are: the input variables to be detected, the check sequence of the encoding method (if true, they will be automatically ignored later), and the strict mode.
Adjust the encoding detection sequence to minimize the possibility of conversion errors.
Generally, we need to first rank gb2312, when there is GBK and UTF-8, we need to sort the commonly used to the front.