Today, I encountered a problem of intercepting Chinese and English strings. in gbk, each word occupies two bytes in Chinese. If it is all Chinese, use the substr () function, however, if both Chinese and English are available, it will be troublesome. I found a good function in the previous Code favorites, which can effectively implement the Interception Function:
Function get_word ($ string, $ length, $ dot = '.. ', $ charset = 'gbk') {if (strlen ($ string) <= $ length) {return $ string ;}$ string = str_replace (array ('', '', '&', '"', '<', '>'), array ('','', '&', '"', '<', '>'), $ string); $ strcut = ''; if (strtolower ($ charset) = 'utf-8 ') {$ n = $ tn = $ noc = 0; while ($ n <strlen ($ string) {$ t = ord ($ string [$ n]); if ($ t = 9 | $ t = 10 | (32 <= $ t & $ t <= 126) {$ tn = 1; $ n ++; $ noc ++;} elseif (194 <=$ t & $ t <= 223) {$ tn = 2; $ n + = 2; $ noc + = 2;} elseif (224 <= $ t & $ t <239) {$ tn = 3; $ n + = 3; $ noc + = 2;} elseif (240 <= $ t & $ t <= 247) {$ tn = 4; $ n + = 4; $ noc + = 2;} elseif (248 <= $ t & $ t <= 251) {$ tn = 5; $ n + = 5; $ noc + = 2;} elseif ($ t = 252 | $ t = 253) {$ tn = 6; $ n + = 6; $ noc + = 2;} else {$ n ++;} if ($ noc >=$ length) {break ;}} if ($ noc> $ l Ength) {$ n-= $ tn;} $ strcut = substr ($ string, 0, $ n);} else {for ($ I = 0; $ I <$ length; $ I ++) {$ strcut. = ord ($ string [$ I])> 127? $ String [$ I]. $ string [++ $ I]: $ string [$ I] ;}} return $ strcut. $ dot;} $ str = "welcome visit concise bkjia"; $ str_result = get_word ($ str, 12); echo $ str_result;
Test result:
Welcome visit ..