This article summarizes the function that intercepts strings Based on the php function substr. However, when encountering a Chinese character, a solution that intercepts half of the Chinese character is garbled, the following describes Chinese and other encoding interceptions.
| The Code is as follows: |
Copy code |
<? /** * String truncation, supporting Chinese and other Encoding * * @ Static * @ Access public * @ Param string $ str string to be converted * @ Param string $ start position * @ Param string $ length truncation length * @ Param string $ charset encoding format * @ Param string $ suffix truncated display characters * @ Return string */ Function msubstr ($ str, $ start = 0, $ length, $ charset = "UTF-8", $ suffix = true) { If (function_exists ("mb_substr ")) Mb_substr ($ str, $ start, $ length, $ charset ); Elseif (function_exists ('iconv _ substr ')){ Iconv_substr ($ str, $ start, $ length, $ charset ); } $ Re ['utf-8'] = "/[x01-x7f] | [xc2-xdf] [x80-xbf] | [xe0-xef] [x80-xbf] {2} | [xf0-xff] [x80-xbf] {3 }/"; $ Re ['gb2312'] = "/[x01-x7f] | [xb0-xf7] [xa0-xfe]/"; $ Re ['gbk'] = "/[x01-x7f] | [x81-xfe] [x40-xfe]/"; $ Re ['big5'] = "/[x01-x7f] | [x81-xfe] ([x40-x7e] | xa1-xfe])/"; Preg_match_all ($ re [$ charset], $ str, $ match ); $ Slice = join ("", array_slice ($ match [0], $ start, $ length )); If ($ suffix) return $ slice ."... "; Return $ slice; } |
If we use php substr directly to capture data, as shown in figure
When English and Chinese characters are mixed, the following problems may occur:
If there is such a string
$ Str = "this is a string ";
To intercept the first 10 characters of the string, use
If (strlen ($ str)> 10) $ str = substr ($ str, 10 )."... ";
The output of echo $ str should be "this is a word... "
Hypothesis
$ Str = "this is a string ";
This string contains a half-width character and is also executed:
If (strlen ($ str)> 10) $ str = substr ($ str, 10 );
The 10th and 11 characters of the original string $ str constitute the Chinese character "character ";
After the string is split, the Chinese character is split into two parts, so that the intercepted string will find garbled characters.
After using the above Code to intercept the code, you can easily solve this problem.
Today, we have found a better way to intercept Chinese strings.
| The Code is as follows: |
Copy code |
Function msubstr ($ str, $ start, $ len ){ $ Tmpstr = ""; $ Strlen = $ start + $ len; For ($ I = 0; $ I <$ strlen; $ I ++ ){ If (ord (substr ($ str, $ I, 1)> 0xa0 ){ $ Tmpstr. = substr ($ str, $ I, 2 ); $ I ++; } Else $ Tmpstr. = substr ($ str, $ I, 1 ); } Return $ tmpstr; } |
Program 2: PHP intercepts the UTF-8 string to solve the problem of half character
/*************************************** ***************************
* PHP intercepts the UTF-8 string to solve the half character problem.
* English letters, numbers (halfwidth) are 1 byte (8 bits), and Chinese (fullwidth) are 3 bytes.
* @ Return refers to the retrieved string. When $ len is less than or equal to 0, the entire string is returned.
* @ Param $ str source string
* $ Length of the substring on the left of len
**************************************** ************************/
| The Code is as follows: |
Copy code |
Function utf_substr ($ str, $ len) { For ($ I = 0; $ I <$ len; $ I ++) { $ Temp_str = substr ($ str, 0, 1 ); If (ord ($ temp_str)> 127) { $ I ++; If ($ I <$ len) { $ New_str [] = substr ($ str, 0, 3 ); $ Str = substr ($ str, 3 ); } } Else { $ New_str [] = substr ($ str, 0, 1 ); $ Str = substr ($ str, 1 ); } } Return join ($ new_str ); } ?> |