We know that to judge the string length in php, we only need to use the strlen () and mb_strlen () functions, but it is not as simple as we think, especially when it comes to a mix of Chinese and English, the above functions are a little inadequate. Let me introduce them to you. We know that to judge the string length in php, we only need to use the strlen () and mb_strlen () functions, but it is not as simple as we think, especially when it comes to a mix of Chinese and English, the above functions are a little inadequate. Let me introduce them to you.
Script ec (2); script
Strlen ()
PHP strlen () function
Definition and usage
The strlen () function returns the length of the string.
Syntax
Strlen (string)
Parameter: string
Description: required. Specifies the string to be checked.
The Code is as follows: |
|
$ Str = 'Chinese character a 1 '; Echo strlen ($ str ); Echo' '; Echo mb_strlen ($ str, 'utf8 '); // Output result // 14 // 6 ?>
|
Result Analysis: During strlen calculation, the Chinese character of UTF8 is 3 characters in length, so the length of "Chinese character a 1 character" is 3*4 + 2 = 14
When mb_strlen is calculated, if the selected inner code is UTF8, a Chinese character is regarded as 1 in length. Therefore, the length of "Chinese a character 1 character" is 6.
Mb_strlen () function
Note that mb_strlen is not a PHP core function. before using it, make sure that php_mbstring.dll is loaded in php. ini to ensure that
The Line "extension = php_mbstring.dll" exists and is not commented out. Otherwise, the number of undefined functions may occur.
The Code is as follows: |
|
$ Str = 'Chinese character a 1 '; // The calculation is as follows: Echo (strlen ($ str) + mb_strlen ($ str, 'utf8')/2; Echo // Output result // 10 ?> |
The strlen ($ str) Value of "Chinese character a 1 character" is 14, and the mb_strlen ($ str) value is 6, the placeholder Value of "Chinese character a 1 character" is 10.
Explain the differences between the two
The Code is as follows: |
|
// During the test, the file encoding method is UTF8. $ Str = 'Chinese character a 1 '; Echo strlen ($ str ).' '; // 14 Echo mb_strlen ($ str, 'utf8 ').' '; // 6 Echo mb_strlen ($ str, 'gbk ').' '; // 8 Echo mb_strlen ($ str, 'gb2312 ').' '; // 10 ?> |
Result Analysis: During strlen calculation, the Chinese character of UTF8 is 3 characters in length, so the length of "Chinese character a 1 character" is 3*4 + 2 = 14, in mb_strlen
During calculation, if the selected inner code is UTF8, a Chinese character will be considered as the length of 1, so the length of "Chinese a character 1" is 6.
Although the above function can solve some mixed Chinese and English problems, it cannot be used in real time. Next I will introduce other good functions to you.
Method.
The PHP code for obtaining the length of a mix of Chinese and English strings is as follows: 1 Chinese = 1 bit, 2 English = 1 bit, you can modify it yourself
The Code is as follows: |
|
/** * PHP obtains the mixed length of strings in Chinese and English. * @ Param $ str string * @ Param $ charset string Encoding * @ Return returns the length of 1 Chinese = 1 bits, 2 English = 1 bits */ Function strLength ($ str, $ charset = 'utf-8 '){ If ($ charset = 'utf-8') $ str = iconv ('utf-8', 'gb2312 ', $ str ); $ Num = strlen ($ str ); $ CnNum = 0; For ($ I = 0; $ I <$ num; $ I ++ ){ If (ord (substr ($ str, $ I + 127)> ){ $ CnNum ++; $ I ++; } } $ EnNum = $ num-($ cnNum * 2 ); $ Number = ($ enNum/2) + $ cnNum; Return ceil ($ number ); } // The length of the test output is 15. $ Str1 = 'test test test '; $ Str2 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa '; $ Str3 = 'aa test aa test aaaaa '; Echo strLength ($ str1, 'gb2312 '); Echo strLength ($ str2, 'gb2312 '); Echo strLength ($ str3, 'gb2312 '); |
Truncates string functions.
UTF8 encoding. In UTF8, a Chinese character occupies three bytes.
The Code is as follows: |
|
Function msubstr ($ str, $ start, $ len ){ $ Tmpstr = ""; $ Strlen = $ start + $ len; For ($ I = 0; $ I <$ strlen; $ I ++ ){ If (ord (substr ($ str, $ I, 1) & gt; 127 ){ $ Tmpstr. = substr ($ str, $ I, 3 ); $ I + = 2; } Else $ Tmpstr. = substr ($ str, $ I, 1 ); } Return $ tmpstr; } Echo msubstr (" english ); |
GB2312 encoding. In gb2312, a Chinese character occupies 2 bytes.
The Code is as follows: |
|
Function msubstr ($ str, $ start, $ len) {// response $ Tmpstr = ""; $ Strlen = $ start + $ len; If (preg_match ('/[/d/s] {2,}/', $ str) {$ strlen = $ strlen-2 ;} For ($ I = 0; $ I <$ strlen; $ I ++ ){ If (ord (substr ($ str, $ I, 1)> 0xa0 ){ $ Tmpstr. = substr ($ str, $ I, 2 ); $ I ++; } Else $ Tmpstr. = substr ($ str, $ I, 1 ); } Return $ tmpstr; } ?> |
Functions with good encoding compatibility
The Code is as follows: |
|
Function cc_msubstr ($ str, $ start = 0, $ length, $ charset = "UTF-8", $ suffix = true) { If (function_exists ("mb_substr ")) Return mb_substr ($ str, $ start, $ length, $ charset ); Elseif (function_exists ('iconv _ substr ')){ Return iconv_substr ($ str, $ start, $ length, $ charset ); } $ Re ['utf-8'] = "/[/x01-/x7f] | [/xc2-/xdf] [/x80-/xbf] | [/xe0 -/ xef] [/x80-/xbf] {2} | [/xf0-/xff] [/X80-/xbf] {3 }/"; $ Re ['gb2312'] = "/[/x01-/x7f] | [/xb0-/xf7] [/xa0-/xfe]/"; $ Re ['gbk'] = "/[/x01-/x7f] | [/x81-/xfe] [/x40-/xfe]/"; $ Re ['big5'] = "/[/x01-/x7f] | [/x81-/xfe] ([/x40-/x7e] |/xa1-/xfe]) /"; Preg_match_all ($ re [$ charset], $ str, $ match ); $ Slice = join ("", array_slice ($ match [0], $ start, $ length )); If ($ suffix) return $ slice ."... "; Return $ slice; } |