We know that sometimes use substr to intercept the Chinese character string of the UTF-8, there will often be garbled, why is such a problem, this article tells you the answer.
Take a look at this piece of code (character encoding for UTF-8 ):
Copy codeThe Code is as follows:
<?
$ Str = 'all know that strlen and mb_strlen are functions for string lengths ';
Echo strlen ($ str) '. <br/>'. mb_strlen ($ str, 'utf-8 ');
?>
Run the preceding code and return the following values:
66
34
How is it? In strlen, Chinese is the length of three bytes, while English is the length of one byte! The length of mb_strlen is calculated as a byte! So, we sometimes use substr to intercept the Chinese string of the UTF-8, often garbled, that is why!
The following provides a function to intercept a UTF-8 string:
Copy codeThe Code is as follows:
Function cutstr ($ sourcestr, $ cutlength ){
$ Returnstr = '';
$ I = 0;
$ N = 0;
$ Str_length = strlen ($ sourcestr );
$ Mb_str_length = mb_strlen ($ sourcestr, 'utf-8 ');
While ($ n <$ cutlength) & ($ I <= $ str_length )){
$ Temp_str = substr ($ sourcestr, $ I, 1 );
$ Ascnum = ord ($ temp_str );
If ($ ascnum> = 224 ){
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 3 );
$ I = $ I + 3;
$ N ++;
}
Elseif ($ ascnum & gt; = 192 ){
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 2 );
$ I = $ I + 2;
$ N ++;
}
Elseif ($ ascnum> = 65) & ($ ascnum <= 90 )){
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 1 );
$ I = $ I + 1;
$ N ++;
}
Else {
$ Returnstr = $ returnstr. substr ($ sourcestr, $ I, 1 );
$ I = $ I + 1;
$ N = $ n + 0.5;
}
}
If ($ mb_str_length> $ cutlength ){
$ Returnstr = $ returnstr ."...";
}
Return $ returnstr;
}
Example:
Copy codeThe Code is as follows:
<?
$ Str = 'validity period: up to three months. If the validity period is exceeded, the system will automatically delete this message ';
// Echo strlen ($ str );
// Echo 'Echo 'Echo '?>