We used the mb_detect_encoding () function to detect character encoding.
| The code is as follows: |
Copy code |
// Determine the encoding of a string If ($ tag = mb_convert_encoding ($ tag, "GB2312", "UTF-8"), "UTF-8", "GB2312 ")){ } Else {// if it is gb2312, it will be converted to utf8 $ Tag = mb_convert_encoding ($ tag, 'utf-8', 'gb2312 '); } |
$ Keytitle = "% D0 % BE % C6 % AC. The detection results are UTF-8. This bug is not actually a bug, when writing the program should not be too dependent on mb_detect_encoding, when the string is short, the possibility of deviation of the detection results is very large.
How can this problem be solved? My solution is:
| The code is as follows: |
Copy code |
$ Encode = mb_detect_encoding ($ keytitle, array ('ascii ', 'gb2312 & prime;, 'gbk', 'utf-8 '); |
Parameters are: the input variables to be detected, the encoding method detection sequence (once true, it is automatically ignored later), and the strict mode.
Adjust the encoding detection sequence to minimize the possibility of conversion errors.
The above solution still cannot be solved, and another solution is found below.
Example 1
| The code is as follows: |
Copy code |
// Returns true if $ string is valid UTF-8 and false otherwise. Function is_utf8 ($ word) { If (preg_match ("/^ ([". chr (1, 228 ). "-". chr (1, 233 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1}) {1}/", $ word) = true | preg_match ("/([". chr (1, 228 ). "-". chr (1, 233 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1}) {1} $/", $ word) = true | preg_match ("/([". chr (1, 228 ). "-". chr (1, 233 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1}) {2,}/", $ word) = true) { Return true; } Else { Return false; } } // Function is_utf8 |