Chinese characters using PHP's own function interception sometimes encounter a question mark, below we have sorted out a few very good Chinese characters accurate interception examples.
PHP in the operation of the string problem time is nothing more than two problems:
1. Determine if the string encoding is GBK or Unicode.
2. Take the appropriate interception method for the corresponding code.
In general, we use substr interception of Chinese characters may encounter garbled problems. Because Chinese characters are double-byte, when a byte is intercepted, the character cannot be displayed and is lost.
In fact the solution is very simple, look at the following intercept function:
The code is as follows |
Copy Code |
Intercept extra-long strings function Curtstr ($str, $len =30) { if (strlen ($STR) > $len) { $str = substr ($str, 0, $len); $str. = Chr (0). " ..."; return $str; } |
The above Chr (0) is not NULL
Null is nothing, and Chr (0) has a value of 0. 16 binary is 0x00, which means binary is 00000000
Although Chr (0) will not show anything, but he is a character.
When the Chinese character is truncated, according to the coding rules he always vlasov the other words in the back together as a Chinese character interpretation, which is the reason for garbled. A combination of 0x81 to 0xFF and 0x00 always shows as "empty"
According to this characteristic, a Chr (0) is appended to the result of substr to prevent garbled characters.
These two points can be implemented by supplementing several functions below to achieve the exact purpose of intercepting Chinese strings:
Intercepting UTF8 encoded multibyte strings
The code is as follows |
Copy Code |
Intercept UTF8 string function Utf8substr ($str, $from, $len) { Return Preg_replace (' #^ (?: [x00-x7f]|[ xc0-xff][x80-xbf]+) {0, '. $from. '} '. ' (?: [x00-x7f]| [Xc0-xff] [x80-xbf]+) {0, '. $len. '}). * #s ', ' $ ', $str); } ?> |
Chinese character interception function supported by UTF-8 and GB2312
The code is as follows |
Copy Code |
/* Chinese character interception function supported by Utf-8 and gb2312 Cut_str (string, intercept length, start length, encode); encoding defaults to Utf-8 Start length defaults to 0 */ function Cut_str ($string, $sublen, $start = 0, $code = ' UTF-8 ') { if ($code = = ' UTF-8 ') { $pa = "/[x01-x7f]| [XC2-XDF] [x80-xbf]|xe0[xa0-xbf][x80-xbf]| [Xe1-xef] [X80-XBF] [x80-xbf]|xf0[x90-xbf][x80-xbf][x80-xbf]| [Xf1-xf7] [X80-XBF] [X80-XBF] [x80-xbf]/]; Preg_match_all ($pa, $string, $t _string); if (count ($t _string[0])-$start > $sublen) return join ("', Array_slice ($t _string[0], $start, $sublen))." ..."; return join ("', Array_slice ($t _string[0], $start, $sublen)); } Else { $start = $start; $sublen = $sublen; $strlen = strlen ($string); $tmpstr = "; for ($i =0; $i < $strlen; $i + +) { if ($i >= $start && $i < ($start + $sublen)) { if (Ord (substr ($string, $i, 1)) >129) { $tmpstr. = substr ($string, $i, 2); } Else { $tmpstr. = substr ($string, $i, 1); } } if (Ord (substr ($string, $i, 1)) >129) $i + +; } if (strlen ($TMPSTR) < $strlen) $tmpstr. = "..."; return $tmpstr; } } $str = "ABCD string to intercept"; Echo Cut_str ($STR, 8, 0, ' gb2312 '); ?> |
http://www.bkjia.com/PHPjc/633112.html www.bkjia.com true http://www.bkjia.com/PHPjc/633112.html techarticle Chinese characters using PHP's own function interception sometimes encounter a question mark, below we have sorted out a few very good Chinese characters accurate interception examples. PHP in the operation of the string problem time is not ...