The original: The difference between strlen and Mb_strlen in PHP
In PHP ,strlen and Mb_strlen are functions that seek the length of a string, but for some beginners it may not be clear what the difference is if you don't read the manual.
The difference between the two is explained in the following example.
$str = " Chinese a 1-character // Echo mb_strlen ($str, utf-8 ); // selected inner code is UTF-8, Chinese as a byte, 6 Echo Mb_strlen ($str, GBK "); // 8 Echo Mb_strlen ($str, gb2312 ); // 10
Results analysis: In the strlen calculation, the treatment of a UTF8 character is 3 length, so "Chinese a character 1 characters" length is 3*4+2=14, in the Mb_strlen calculation, the selected inner code is UTF8, will be a Chinese character as the length of the calculation, so "a character 1 characters" The length is 6.
As for gbk,gb2312 I am not very clear, please advise.
The two functions can be combined to calculate the number of placeholders for a mixed Chinese-English string (a placeholder for a Chinese character is 2, the English character is 1)
Echo (strlen ($STR) + Mb_strlen ($str,'UTF8'2
For example, the strlen ($STR) value of the Chinese a character 1 character is 14,mb_strlen ($STR) value is 6, then you can calculate the placeholder for the "Chinese-a-word 1-character" is 10.
Echo mb_internal_encoding ();
The internal code can be obtained through the mb_internal_encoding () function.
PHP built-in string length function strlen cannot handle Chinese strings correctly, it only gets the number of bytes that the string occupies. For the Chinese encoding of GB2312, strlen get the value is twice times the number of Chinese characters, and for UTF-8 encoded in Chinese, is 3 times times the difference (in UTF-8 encoding, a Chinese character accounted for 3 bytes).
Using Mb_strlen function can solve this problem well. The usage of Mb_strlen is similar to strlen, except that it has a second optional parameter for specifying character encoding. For example get UTF-8 string $str length, can be used Mb_strlen ($str, ' UTF-8 '). If you omit the second argument, the internal encoding of PHP is used.
It is important to note that Mb_strlen is not a PHP core function and needs to be sure that the Php_mbstring.dll is loaded in the php.ini before use, that is, to ensure that the "Extension=php_mbstring.dll" line is present and not commented out. Otherwise, there is an issue with undefined functions.
About Mb_internal_encoding (); In the internal code, PHP is working with wide characters. The default processing of text files uses that encoding setting.
A wide character is a character that requires a multibyte representation. Mb_internal_encoding is the default encoding setting that gets the MB extension to get. The MB extension is a PHP function library that handles wide characters (for example: Chinese, Japanese, Korean, and so on).
Strlen (), substr () is handled by an internal code (mb_internal_encoding ()) when the string is manipulated.
The difference between strlen and Mb_strlen in PHP