We often use the strlen () function in PHP to understand the length of a string. The usage is as follows:
PHP strlen () function definition and usage
The strlen () function returns the length of the string.
Syntax
Strlen (string)
Parameter: string
Description: required. Specifies the string to be checked.
Instance
The code is as follows: |
Copy code |
<? Php Echo strlen ("Hello world! "); ?> |
Output: 12
How to handle Chinese
PHP built-in functions such as strlen () and mb_strlen () calculate the number of bytes occupied by the string to calculate the length of the string. An English character occupies 1 byte. Example:
$
The code is as follows: |
Copy code |
EnStr = 'Hello, China! '; Echo strlen ($ enStr); // output: 12 |
Chinese is not the case. For Chinese websites, two types of codes are generally used: gbk/gb2312 or UTF-8. UTF-8 is compatible with more characters, so it is favored by many webmasters. Gbk and UTF-8 are different in Chinese encoding, which leads to differences in the number of bytes occupied by gbk and UTF-8 encoding.
Each Chinese character occupies 2 bytes in gbk encoding, for example:
The code is as follows: |
Copy code |
$ ZhStr = 'Hello, China! '; Echo strlen ($ zhStr); // output: 12 |
Each Chinese character occupies 3 bytes in UTF-8 encoding, for example:
The code is as follows: |
Copy code |
$ ZhStr = 'Hello, China! '; Echo strlen ($ zhStr); // output: 18 |
So how can we calculate the length of this set of Chinese strings? Some people may say that the length of a Chinese string obtained in gbk is divided by 2. Is it okay to divide it by 3 in UTF-8 encoding? However, you need to consider that the string is not honest, and 99% of the cases will appear in a mix of Chinese and English.
This is a piece of code in WordPress. The main idea is to break down the string into individual units using regular expressions, and then calculate the number of units, that is, the length of the string. The code is as follows (only UTF-8 encoded strings can be processed):
The code is as follows: |
Copy code |
$ ZhStr = 'Hello, China! '; $ Str = 'Hello, China! '; // Calculate the length of a Chinese string Function utf8_strlen ($ string = null ){ // Splits the string into units. Preg_match_all ("/./us", $ string, $ match ); // Returns the number of units. Return count ($ match [0]); } Echo utf8_strlen ($ zhStr); // output: 6 Echo utf8_strlen ($ str); // output: 9 |
The following is a supplement: the difference between accurately calculating the number of characters and calculating the number of bytes
The code is as follows: |
Copy code |
/Assume that the current page is encoded as GBK <? Php $ Str = "CHINA abc "; Echo strlen ($ str); // returns 7. Because GBK encodes two Chinese bytes, strlen is the length of the returned string. Echo "Echo iconv_strlen ($ str, "GBK"); // return 5. iconv_strlen is the number of characters in the statistical string. ?> Or write it like this. <? Php $ Biaoti = "People's Republic of China "; $ Zijie = strlen ($ biaoti ); Echo $ zijie. "<br>"; // <br> line feed ?> // Assume that the current page is encoded as a UTF-8 <? Php $ Str = "CHINA abc "; Echo strlen ($ str); // return 9, because the UTF-8 encodes three bytes in each Chinese, strlen is the length of the byte returned by the string. Echo "Echo iconv_strlen ($ str, "UTF-8"); // return 5. iconv_strlen is the number of characters in the statistical string ?> Iconv_strlen can calculate the exact number of characters regardless of the encoding. // Assume that the current page is encoded as GBK <? Php $ Str = "CHINA abc "; Echo strlen ($ str); // returns 7. Because GBK encodes two Chinese bytes, strlen is the length of the returned string. Echo "Echo iconv_strlen ($ str, "GBK"); // return 5. iconv_strlen is the number of characters in the statistical string. ?> // Assume that the current page is encoded as a UTF-8 <? Php $ Str = "CHINA abc "; Echo strlen ($ str); // return 9, because the UTF-8 encodes three bytes in each Chinese, strlen is the length of the byte returned by the string. Echo "Echo iconv_strlen ($ str, "UTF-8"); // return 5. iconv_strlen is the number of characters in the statistical string ?>
|
Iconv_strlen can calculate the exact number of characters regardless of the encoding.