PHP code used to determine whether the string encoding is UTF-8. This article will introduce the program code for PHP to determine whether the string encoding is UTF-8. if you are interested, please refer to it. We used to use mb_detect_encoding () in the past. This article will introduce you to the program code for PHP to determine whether the string encoding is UTF-8. if you are interested, please refer to it.
We used the mb_detect_encoding () function to detect character encoding.
| The code is as follows: |
|
// Determine the encoding of a string If ($ tag = mb_convert_encoding ($ tag, "GB2312", "UTF-8"), "UTF-8", "GB2312 ")){ } Else {// if it is gb2312, it will be converted to utf8 $ Tag = mb_convert_encoding ($ tag, 'utf-8', 'gb2312 '); } |
$ Keytitle = "% D0 % BE % C6 % AC. The detection results are UTF-8. this bug is not actually a bug, when writing the program should not be too dependent on mb_detect_encoding, when the string is short, the possibility of deviation of the detection results is very large.
How can this problem be solved? my solution is:
| The code is as follows: |
|
$ Encode = mb_detect_encoding ($ keytitle, array ('ascii ', 'gb2312', 'gbk', 'utf-8 '); |
Parameters are: the input variables to be detected, the encoding method detection sequence (once true, it is automatically ignored later), and the strict mode.
Adjust the encoding detection sequence to minimize the possibility of conversion errors.
The above solution still cannot be solved, and another solution is found below.
Example 1
| The code is as follows: |
|
// Returns true if $ string is valid UTF-8 and false otherwise. Function is_utf8 ($ word) { If (preg_match ("/^ ([". chr (1, 228 ). "-". chr (1, 233 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1}) {1}/", $ word) = true | preg_match ("/([". chr (1, 228 ). "-". chr (1, 233 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1}) {1} $/", $ word) = true | preg_match ("/([". chr (1, 228 ). "-". chr (1, 233 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1 }[". chr (1, 128 ). "-". chr (1, 191 ). "] {1}) {2,}/", $ word) = true) { Return true; } Else { Return false; } } // Function is_utf8 |
Bytes. We used to use mb_detect_encoding...