Chinese websites generally choose two types of code: gbk/gb2312 or Utf-8. Each Chinese character in the GBK encoding occupies 2 bytes, as an example:
$zhStr = ' Hello, China! '; Echo strlen ($zhStr// output:
Each Chinese character occupies 3 bytes under UTF-8 encoding.
$zhStr = ' Hello, China! '; Echo strlen ($zhStr// output:
So how do you calculate the length of this set of Chinese strings? One might say GBK to get the Chinese string length divided by the 2,utf-8 code divided by 3 isn't it OK? However, you have to consider the string is not honest, 99% of the situation will be mixed in the Chinese and English situation.
This is a piece of code in WordPress, the main idea is to first use the regular string decomposition into an individual unit, and then calculate the number of units is the length of the string, the code is as follows (only the string under the Utf-8 encoding):
$zhStr=' Hello, China! ';$str= ' Hello,China! ';//calculating Chinese string LengthsfunctionUtf8_strlen ($string=NULL) {//To decompose a string into a cellPreg_match_all("/./us",$string,$match);//returns the number of unitsreturn Count($match[0]);}EchoUtf8_strlen ($zhStr);//Output: 6EchoUtf8_strlen ($str);//Output: 9
Below I encapsulate a function to accurately calculate the length of the Chinese string:
functionCount_strlen ($string=NULL){ $fileType= Mb_detect_encoding ($string,Array(' UTF-8 ', ' GBK ', ' LATIN1 ', ' BIG5 ')) ;//determine the type of string encoding in Chinese $length=Iconv_strlen($string,$fileType);//calculating string lengths based on character encoding return $length;}$str= "Chinese 45 Brunei";$len= Count_strlen ($str);Echo $len;//Output 5
PHP statistics Chinese string length