Tonight, when writing a form validation class for a framework, you need to determine whether a string length is within a specified range, and naturally, think of the strlen function in PHP.

The code is as follows

$str = ' Hello world! ';
echo strlen ($STR); Output 12

Test Chinese

The code is as follows

$str = ' Hello, world! ';
echo strlen ($STR); GBK or GB2312 lower output 12,utf-8 under output 18

PHP's built-in string length function strlen does not handle the Chinese string correctly, it gets just the number of bytes in the string. For GB2312 Chinese encoding, strlen gets twice times the number of Chinese characters, and for UTF-8 encoded Chinese, it is 3 times times the difference (in UTF-8 code, a Chinese character occupies 3 bytes).

The following example is from the famous WordPress, very accurate Oh, and also note that this function is only applicable to utf-8 encoded strings.

The code is as follows

function Utf8_strlen ($string =null) {
To decompose a string into cells
Preg_match_all ("/./us", $string, $match);
Return number of cells
return count ($match [0]);

But the above code does not handle gbk/gb2312 's Chinese string under the UTF-8 code, because the gbk/gb2312 character will be recognized as two characters and the number of words in the calculation will double, so I thought of such a way:

The code is as follows

$tmp = @iconv (' GBK ', ' utf-8 ', $str);
if (!empty ($tmp)) {
$str = $tmp;
Preg_match_all ('/./us ', $str, $match);
echo Count ($match [0]);

Compatible gbk/gb2312 and UTF-8 codes, passed with a small amount of data, but not yet determined to be completely correct

