Mb_strwidth ($STR, $encoding) returns the width of the string
$str the string to be evaluated
$encoding the encoding to use, such as UTF8, GBK
Mb_strimwidth ($str, $start, $width, $tail, $encoding) intercept strings by width
$str the string to intercept
$start where to start intercepting, the default is 0
$width the width to intercept
$tail append to the string behind the intercept string, the common ...
$encoding the encoding to use
The code is as follows |
Copy Code |
. PHP /** * UTF8 encoded format * 1 Chinese 3 bytes * We want 1 Chinese to occupy 2 bytes, * Because the position of 2 letters from the width is equivalent to 1 Chinese */ //test string $str = ' aaaa ah aaaa ah ah aaa '; Echo strlen ($STR); Output is only strlen to 25 bytes ///must specify encoding, or it will use PHP's inner Code mb_internal_encoding () to see the inner code //////Use Mb_strwidth output string width of 20 using UTF8 encoding Echo mb_strwidth ($str, ' utf8 '); //Only UTF8 width greater than 10 if (Mb_strwidth ($str, ' ') >10) { //Here set to intercept from 0, take 10 append ..., using UTF8 encoding //Note the additional ... will also be computed to a length of $str = mb_strimwidth ($str, 0, ' ... ', ' UTF8 '); //FINAL output aaaa ah ... 4 A, 4, 1, 2 3, 3 4+2+3=9 //is not very simple ah, some people say why 9 is not 10? //Because the right "ah" behind or "ah", Chinese 2, 9+2=11 exceeded the set, so remove one is 9 Echo $str; |
If there is no problem with all Chinese, but if there is a sign in the middle of the problem, such as I use Mb_strimwidth,mb_strwidth, then found that if there is a "" symbol in the title, PHP Mb_strwidth will think that the symbol is 1 width, I wonder if this is not in Chinese double quotes, logically must be a wide-byte, length should be 2 widths, after the query "" Unicode is u201c and u201d, not in the range of characters, and then query the unicode.org of the Code table, Found that u2000-u206f is a universal symbol range, although the characters in this range is a wide-character form, but the PHP mb_ function is considered to be 1 widths, no way, can only rely on their own.
The code is as follows |
Copy Code |
function Truncstring ($str, $length) { $countLen =0; for ($i =0; $i <mb_strlen ($STR); $i + +) { $countLen +=amb_strwidth (Mb_substr ($str, $i, 1)); if ($countLen > $length) Return Mb_substr ($str, 0, $i); } return $str; } Function amb_strwidth ($str _widt h) { $count =0; for ($i =0; $i <mb_strlen ($str _width); $i + +) { //if ( Mb_substr ($str _width, $i, 1) = = "\xe2\x80\x9c" | | Mb_substr ($str _width, $i, 1) = = ' \xe2\x80\x9d ') //If you encounter characters within u2000-u206f, add the counter 2 if (Preg_match ("/[\x {2000}-\x{206f}]/u ", mb_substr ($str _width, $i, 1))) $count +=2; else $count +=mb_ Strwidth (Mb_substr ($str _width, $i, 1)); } return $count; } |
Summary, do to make how to feel this becomes a back to the origin of the point, the feeling or to use the loop traversal calculation character encoding to take the number of digits ha.