tonight, when writing a form validation class for a framework, you need to determine whether a string length is within a specified range, and naturally, think of the strlen function in PHP.
. The code is as follows:
$str = ' Hello world! ';
Echo strlen ($STR); Output 12
However, in PHP's self-contained functions, strlen and Mb_strlen calculate the length by calculating the number of bytes in the string, and the number of bytes in Chinese is different in different encodings. Under gbk/gb2312, Chinese characters account for 2 bytes, while in UTF-8, Chinese characters are 3 bytes.
. The code is as follows:
$str = ' Hello, world! ';
Echo strlen ($STR); GBK or GB2312 lower output 12,utf-8 under output 18
when we judge the length of a string, we often need to judge the number of characters, not the number of bytes in the string, such as the PHP code under UTF-8:
. The code is as follows:
$name = ' Zhang Ge Chang ';
$len = strlen ($name);
//Output FALSE because under UTF-8 three Chinese accounts for 9 bytes
if ($len >= 3 && $len <= 8) {
echo ' TRUE ';
}else{
echo ' FALSE ';
}
So what is the convenient and practical way to get the length of a string containing Chinese? You can use regular to calculate the number of Chinese characters, under the gbk/gb2312 encoding divided by the 2,utf-8 code is divided by 3, and finally add the length of the non-Chinese string, but this is too much trouble.
WordPress Such a piece of code, for reference as follows:
. The code is as follows:
$str = ' Hello, world! ';
preg_match_all ('/./us ', $str, $match);
echo Count ($match [0]); Output 9
The idea is to split a string into a single character with a regular expression and calculate the number of characters to match directly with Count, which is the result we want.
But the above code does not handle gbk/gb2312 's Chinese string under the UTF-8 code, because the gbk/gb2312 character will be recognized as two characters and the number of words in the calculation will double, so I thought of such a way:
. The code is as follows:
$tmp = @iconv (' GBK ', ' utf-8 ', $str);
if (!empty ($tmp)) {
$str = $tmp;
}
preg_match_all ('/./us ', $str, $match);
echo Count ($match [0]);
compatible with gbk/gb2312 and UTF-8 coding, through a small number of data testing, but has not been determined to be completely correct, look forward to Daniel Twos.
The above is meant to be compatible with a variety of coding formats, but generally in the day-to-day development of a project is already can be determined by what encoding, so you can use the following functions to easily get string length:
. The code is as follows:
int Iconv_strlen (String $str [, String $charset = Ini_get ("iconv.internal_encoding")])