In PHP, the character truncation function substr can only intercept all English letters without garbled characters. If there is a Chinese character in it, it cannot be intercept. The following small series will introduce two functions compatible with various gbk and UTF-8 encoded string interceptions.
Example 1
The Code is as follows: |
Copy code |
Function CsubStrPro ($ str, $ start, $ length, $ charset = "UTF-8", $ suffix = false) { If (function_exists ("mb_substr ")) Return mb_substr ($ str, $ start, $ length, $ charset ); $ Re ['utf-8'] = "/[x01-x7f] | [xc2-xdf] [x80-xbf] | [xe0-xef] [x80-xbf] {2} | [xf0-xff] [x80-xbf] {3 }/"; $ Re ['gb2312'] = "/[x01-x7f] | [xb0-xf7] [xa0-xfe]/"; $ Re ['gbk'] = "/[x01-x7f] | [x81-xfe] [x40-xfe]/"; $ Re ['big5'] = "/[x01-x7f] | [x81-xfe] ([x40-x7e] | xa1-xfe])/"; Preg_match_all ($ re [$ charset], $ str, $ match ); $ Slice = join ("", array_slice ($ match [0], $ start, $ length )); If ($ suffix) Return $ slice ."... "; Return $ slice; } |
Example 2
The Code is as follows: |
Copy code |
Function subString_UTF8 ($ str, $ start, $ lenth) { $ Len = strlen ($ str ); $ R = array (); $ N = 0; $ M = 0; For ($ I = 0; $ I <$ len; $ I ++ ){ $ X = substr ($ str, $ I, 1 ); $ A = base_convert (ord ($ x), 10, 2 ); $ A = substr ('20140901'. $ a,-8 ); If ($ n <$ start ){ If (substr ($ a, 0, 1) = 0 ){ } Elseif (substr ($ a, 0, 3) = 110 ){ $ I + = 1; } Elseif (substr ($ a, 0, 4) = 1110 ){ $ I + = 2; } $ N ++; } Else { If (substr ($ a, 0, 1) = 0 ){ $ R [] = substr ($ str, $ I, 1 ); } Elseif (substr ($ a, 0, 3) = 110 ){ $ R [] = substr ($ str, $ I, 2 ); $ I + = 1; } Elseif (substr ($ a, 0, 4) = 1110 ){ $ R [] = substr ($ str, $ I, 3 ); $ I + = 2; } Else { $ R [] = ''; } If (++ $ m >=$ lenth ){ Break; } } } Return $ r; } // End subString_UTF8; } // End String |
# Because this function returns an array, you must use the join function to display the string:
# Join ('', subString_UTF8 ($ str, $ start, $ lenth ));
# When the page is displayed, you can connect a "..." to the end of this statement "..."