We know that sometimes use substr to intercept UTF-8 Chinese strings, often garbled, why there is such a problem, this article tells you the answer.
Look at this piece of code (character encoding is UTF-8):
Copy Code code as follows:
?
$str = ' all know that strlen and Mb_strlen are functions that seek string length ';
echo strlen ($STR) '. <br/> '. Mb_strlen ($str, ' utf-8 ');
?>
Run the above code and return the following values:
66
34
What do you think? In Strlen, Chinese is three byte length, English is the length of a byte! In Mb_strlen, are counted as a byte of length! Therefore, we sometimes use substr to intercept UTF-8 Chinese strings, often garbled, is the reason!
A function that intercepts the UTF-8 string is provided below:
Copy Code code as follows:
function Cutstr ($SOURCESTR, $cutlength) {
$returnstr = ';
$i = 0;
$n = 0;
$str _length = strlen ($SOURCESTR);
$MB _str_length = Mb_strlen ($sourcestr, ' utf-8 ');
while ($n < $cutlength) && ($i <= $str _length)) {
$temp _str = substr ($sourcestr, $i, 1);
$ascnum = Ord ($temp _str);
if ($ascnum >= 224) {
$returnstr = $returnstr. substr ($sourcestr, $i, 3);
$i = $i + 3;
$n + +;
}
ElseIf ($ascnum >= 192) {
$returnstr = $returnstr. substr ($sourcestr, $i, 2);
$i = $i + 2;
$n + +;
}
ElseIf ($ascnum >=) && ($ascnum <= 90)) {
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i + 1;
$n + +;
}
else{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i + 1;
$n = $n + 0.5;
}
}
if ($MB _str_length > $cutlength) {
$returnstr = $returnstr. "...";
}
return $returnstr;
}
Use examples:
Copy Code code as follows:
.
$str = ' valid for a maximum period of three months, the system will automatically delete this article information ';
//echo strlen ($STR);
//echo ' Echo ' Echo ' ?>