Strlen () function and Mb_strlen () function
In PHP, the function strlen () returns the length of the string. The function prototype is as follows:
Copy Code code as follows:
int strlen (string string_input);
Parameter string_input is the string to be processed.
The strlen () function returns the byte length of a string, with an English letter, number, and various symbols representing one byte, each of which is 1. A noon character occupies two bytes, so the length of a noon character is 2. For example
Copy Code code as follows:
<?php
Echo strlen ("www.sunchis.com");
Echo strlen ("Three Knowledge Development Network");
?>
"Echo strlen" ("www.sunchis.com"); The results of the operation: 15
"Echo strlen" ("Three Knowledge Development Network"); The results of the operation: 15
Here is a question, a Chinese character is not accounted for 2 bytes? "Three know development net", obviously is five Chinese characters, the result of the operation how can be 15?
Here's why: When strlen () is calculated, for a UTF-8 Chinese character, it is treated as a length 3来. When the Chinese and English mixed row, how to accurately calculate the length of the string? Here, we have to introduce another function Mb_strlen (). The use of the Mb_strlen () function is almost as much as strlen (), but it is just a single argument with a specified character set encoding. The function prototype is:
Copy Code code as follows:
int Mb_strlen (string string_input, string encode);
PHP's built-in string length function strlen does not handle the Chinese string correctly, it gets just the number of bytes in the string. For GB2312 Chinese encoding, strlen gets twice times the number of Chinese characters, and for UTF-8 encoded Chinese, it is 3 times times the difference (in UTF-8 code, a Chinese character occupies 3 bytes). Therefore, the following code can accurately calculate the length of the Chinese string:
Copy Code code as follows:
<?php
$str = "Three know sunchis development network";
echo strlen ($STR). " <br> "; Results: 22
Echo Mb_strlen ($str, "UTF8"). " <br> "; Results: 12
$strlen = (strlen ($str) +mb_strlen ($str, "UTF8"))/2;
Echo $strlen; Results: 17
?>
Principle Analysis:
strlen () calculation, the length of the UTF-8 is 3, so the "three-knowledge sunchis development Network" is the length of 5x3+7x1=22
In Mb_strlen calculations, when the selected inner code is UTF8, a Chinese character is computed as a length, so the "three-knowledge sunchis development Network" length is 5x1+7x1=12
The rest is pure maths problem, here is not verbose ...
Note:For Mb_strlen ($str, ' UTF-8 '), if the second argument is omitted, PHP's internal encoding is used. The internal code can be obtained by the mb_internal_encoding () function. It's important to note that Mb_strlen is not a PHP core function, and you need to make sure that the "Extension=php_mbstring.dll" line exists and is not commented out, before you use it, to ensure that the Php_mbstring.dll is loaded in php.ini. Otherwise, there is an issue with undefined functions.