This article describes how to use the php string truncation function substr to intercept Chinese characters and troubleshoot garbled characters. For more information, see. Php string truncation function substr: String substr (string $ string, int $ start [, int $ length]) returns the string with the length starting from the start position. The substr function intercepts characters in bytes. Chinese characters are encoded as two bytes in GB2312 encoding and UTF-8 encoding as three bytes, therefore, if Chinese characters are truncated when a string of the specified length is intercepted, garbled characters are displayed in the returned results. The following two solutions are provided for your reference. 1. use the mb_substr function string mb_substr (string $ str, int $ start [, int $ length [, string $ encoding]) similar to the substr () function, only the count is counted by the number of characters. to ensure the safety of characters, use the mb_substr () function to avoid garbled characters. Disadvantage: the length statistics are changed to the number of characters instead of the number of bytes. When used for display, Chinese results with the same length and English results will show a large difference in display length. 2. the substr feature is enhanced by self-built functions. the Chinese characters are measured in two units of length, so that the final display length of string truncation results is close to that in both Chinese and English environments. The last incomplete character is discarded, it ensures that no garbled characters are displayed. it is compatible with UTF-8 and GB2312 encoding commonly used for Chinese characters and has good versatility. The complete code is as follows (the strtolower function is used ):
$ Length) {// truncation character $ wordscut = ''; if (strtolower ($ encoding) = 'utf-8') {// utf8 encoding $ n = 0; $ tn = 0; $ noc = 0; while ($ n <strlen ($ string) {$ t = ord ($ string [$ n]); if ($ t = 9 | $ t = 10 | (32 <= $ t & $ t <= 126) {$ tn = 1; $ n ++; $ noc ++;} elseif (194 <=$ t & $ t <= 223) {$ tn = 2; $ n + = 2; $ noc + = 2;} elseif (224 <= $ t & $ t <239) {$ tn = 3; $ n + = 3; $ noc + = 2 ;} elseif (240 <= $ t & $ t <= 247) {$ tn = 4; $ n + = 4; $ noc + = 2 ;} elseif (248 <= $ t & $ t <= 251) {$ tn = 5; $ n + = 5; $ noc + = 2 ;} elseif ($ t = 252 | $ t = 253) {$ tn = 6; $ n + = 6; $ noc + = 2 ;} else {$ n ++;} if ($ noc >=$ length) {break;} if ($ noc >$ length) {$ n-= $ tn ;} $ wordscut = substr ($ string, 0, $ n);} else {for ($ I = 0; $ I <$ length-1; $ I ++) {if (ord ($ string [$ I])> 127) {$ wordscut. = $ string [$ I]. $ string [$ I + 1]; $ I ++;} else {$ wordscut. = $ string [$ I] ;}}$ string = $ wordscut;} return trim ($ string) ;}// example echo getstr ("0 February 5, 1234 ", 1 ).' '; // 0 echo getstr ("0 February 5, 1234", 2 ).' '; // 0 echo getstr ("0 February 5, 1234", 3 ).' '; // 0 echo getstr ("0 February 5, 1234", 4 ).' '; // 0 echo getstr ("0 February 5, 1234", 5 ).' '; // 0 one or two echo getstr ("0 one a two B three Four Six Seven", 1 ).' '; // 0 echo getstr ("0 a Two B three Four Six Seven", 2 ).' '; // 0 echo getstr ("0 a Two B three Four Six Seven", 3 ).' '; // 0 echo getstr ("0 a Two B three Four Six Seven", 4 ).' '; // 0 aecho getstr ("0 a Two B three Four Six Seven", 5 ).' '; // 0 a // This function is modified by the getstr () function in UCHome 1.5.?> |