When writing the Form Verification class of the Framework tonight, you need to determine whether the length of a string is within the specified range. Naturally, you think of The strlen function in PHP.
The Code is as follows: |
|
$ Str = 'Hello world! Medium '; Echo strlen ($ str); // OUTPUT 12 |
Test Chinese
The Code is as follows: |
|
$ Str = 'hello, world! '; Echo strlen ($ str); // OUTPUT 12 Under GBK or GB2312, output 18 under UTF-8 |
The built-in String Length function strlen in PHP cannot properly process Chinese strings. It only obtains the number of bytes occupied by strings. For GB2312 Chinese encoding, strlen obtains two times the number of Chinese characters, and for the UTF-8 encoding of Chinese, is three times the difference (in the UTF-8 encoding, A Chinese Character occupies 3 bytes ).
The following example is taken from the famous WordPress, which is very accurate. In addition, note that this function is only applicable to UTF-8 encoded strings.
The Code is as follows: |
|
Function utf8_strlen ($ string = null ){ // Splits the string into units. Preg_match_all ("/./us", $ string, $ match ); // Returns the number of units. Return count ($ match [0]); } |
But the above Code in the UTF-8 encoding does not process GBK/GB2312 Chinese string, because GBK/GB2312 Chinese characters will be recognized as two characters and the number of Chinese characters calculated will double, so I came up with the following method:
The Code is as follows: |
|
$ Tmp = @ iconv ('gbk', 'utf-8', $ str ); If (! Empty ($ tmp )){ $ Str = $ tmp; } Preg_match_all ('/./us', $ str, $ match ); Echo count ($ match [0]); |
Compatible with GBK/GB2312 and UTF-8 code, passed a small amount of data test, but not sure whether it is completely correct