Php Chinese character string truncation garbled problem solution. The problem of Chinese truncation garbled characters is generally Chinese text mixing more, if you intercept English will not be a problem, Chinese will have, the main reason is: String Encoding for UTF-8, A Chinese interception garbled problem is generally Chinese text when mixing more, if you intercept English will not be a problem, there will be Chinese, the main reason is: String Encoding for UTF-8, A Chinese character occupies three bytes while a string is encoded as GB2312. a Chinese character occupies two bytes. Next let's take a look at the instance.
The string is encoded as GB2312, and a Chinese character occupies two bytes:
The code is as follows: |
|
Public static function chinesesubstr ($ str, $ start, $ len) {// $ str indicates the string, $ start indicates the start position of the string, and $ len indicates the string length. $ Strlen = $ start + $ len; // use $ strlen to store the total length of the string, that is, from the start position of the string to the total length of the string For ($ I = $ start; $ I <$ strlen ;){ If (ord (substr ($ str, $ I, 1)> 0xa0) {// if the ASCII order value of the first byte in the string is greater than 0xa0, it indicates Chinese characters $ Tmpstr. = substr ($ str, $ I, 2); // Extract two characters each time and assign them to the variable $ tmpstr, which is equal to one Chinese character. $ I = $ I + 2; // variable auto-Increment 2 } Else { $ Tmpstr. = substr ($ str, $ I, 1); // if it is not a Chinese character, a character is taken out each time and assigned to the variable $ tmpstr $ I ++; } } Return $ tmpstr; // return a string } |
Character string encoded as UTF-8, a Chinese character occupies three bytes:
The code is as follows: |
|
Public static function chinesesubstr ($ str, $ start, $ len) {// $ str indicates the string, $ start indicates the start position of the string, and $ len indicates the string length. $ Strlen = $ start + $ len; // use $ strlen to store the total length of the string, that is, from the start position of the string to the total length of the string For ($ I = $ start; $ I <$ strlen ;){ If (ord (substr ($ str, $ I, 1)> 0xa0) {// if the ASCII order value of the first byte in the string is greater than 0xa0, it indicates Chinese characters $ Tmpstr. = substr ($ str, $ I, 3); // each time, three characters are taken out and assigned to the variable $ tmpstr, which is equal to a Chinese character. $ I = $ I + 3; // variable auto-Increment 3 } Else { $ Tmpstr. = substr ($ str, $ I, 1); // if it is not a Chinese character, a character is taken out each time and assigned to the variable $ tmpstr $ I ++; } } Return $ tmpstr; // return a string }
|
Although this problem has been solved above, it is relatively troublesome to pay attention to the encoding problem. here is a solution that no matter what encoding is available.
The code is as follows: |
|
/** * Chinese character truncation functions supported by Utf-8 and gb2312 * Cut_str (string, truncation length, start length, encoding ); * The default encoding format is UTF-8. * The default start length is 0. */ Function cut_str ($ string, $ sublen, $ start = 0, $ code = 'utf-8 ') { If ($ code = 'utf-8 ') { $ Pa = "/[x01-x7f] | [xc2-xdf] [x80-xbf] | xe0 [xa0-xbf] [x80-xbf] | [xe1-xef] [x80-xbf] [x80-xbf] | xf0 [x90-xbf] [x80-xbf]] [x80-xbf] | [xf1-xf7] [x80-xbf] [x80-xbf] [x80-xbf]/"; Preg_match_all ($ pa, $ string, $ t_string ); If (count ($ t_string [0])-$ start> $ sublen) return join ('', array_slice ($ t_string [0], $ start, $ sublen )). "... "; Return join ('', array_slice ($ t_string [0], $ start, $ sublen )); } Else { $ Start = $ start * 2; $ Sublen = $ sublen * 2; $ Strlen = strlen ($ string ); $ Tmpstr = ''; For ($ I = 0; $ I <$ strlen; $ I ++) { If ($ I >=$ start & $ I <($ start + $ sublen )) { If (ord (substr ($ string, $ I, 1)> 129) { $ Tmpstr. = substr ($ string, $ I, 2 ); } Else { $ Tmpstr. = substr ($ string, $ I, 1 ); } } If (ord (substr ($ string, $ I, 1)> 129) $ I ++; } If (strlen ($ tmpstr) <$ strlen) $ tmpstr. = "... "; Return $ tmpstr; } } |
...