Using the differences between the above characters and the English language, we can use the Mb_strlen function and the strlen function to calculate two sets of length numbers, and then do the operation according to the law to determine the type of string.
Let's talk about Strlen and Mb_strlen is a function of string length
The code is as follows |
Copy Code |
<?php How the file is encoded when tested if UTF8 $str = ' Chinese a word 1 characters '; echo strlen ($str). ' <br> ';//14 Echo Mb_strlen ($str, ' UTF8 '). ' <br> ';//6 Echo Mb_strlen ($str, ' GBK '). ' <br> ';//8 Echo Mb_strlen ($str, ' gb2312 '). ' <br> ';//10 ?> |
PHP's built-in string length function strlen does not handle the Chinese string correctly, it gets just the number of bytes in the string. For GB2312 Chinese encoding, strlen gets twice times the number of Chinese characters, and for UTF-8 encoded Chinese, it is 3 times times the difference (in UTF-8 code, a Chinese character occupies 3 bytes).
Cases
code is as follows |
copy code |
<?php /** * php judgment String Pure kanji or pure English or Chinese-English mixed */ Echo ' <meta charset= ' utf-8 '/> '; Function Utf8_str ($str) { $mb = Mb_strlen ($str, ' utf-8 '); $st = strlen ($STR); if ($st = = $MB) Return ' pure English '; If ($st% $mb ==0 && $st%3==0) Return ' pure Chinese characters '; Return ' Chinese-English mix '; } $str = ' Jones blog '; Echo ' string: <span style= ' color:red > '. $str. ' </span>, <span style= ' color:red ' > '. UTF8_STR ($str). ' </span> '; ? |
It is important to note that Mb_strlen is not a PHP core function, and you need to ensure that the Php_mbstring.dll is loaded in php.ini before use, that is, to ensure that "Extension=php_mbstring.dll" This line exists and is not commented out, otherwise there will be an issue with undefined functions