The function that intercepts characters based on the utf8 encoding rules. The utf8 version of sub_str supports 1 ~ 6-byte characters are intercepted, not only for Chinese characters.
The code is as follows:
/*
* Function: Similar to substr, it does not cause garbled characters.
* Parameters:
* Return value:
*/
Function utf8_substr ($ str, $ start, $ length = null ){
// The screenshot is intercepted normally first.
$ Res = substr ($ str, $ start, $ length );
$ Strlen = strlen ($ str );
/* Determine whether 6 bytes at the beginning and end are complete (not incomplete )*/
// If the start parameter is a positive number
If ($ start> = 0 ){
// Cut about 6 bytes forward
$ Next_start = $ start + $ length; // initial position
$ Next_len = $ next_start + 6 <= $ strlen? 6: $ strlen-$ next_start;
$ Next_segm = substr ($ str, $ next_start, $ next_len );
// If 1st bytes is not the first byte of the complete character, it is truncated to about 6 bytes.
$ Prev_start = $ start-6> 0? $ Start-6: 0;
$ Prev_segm = substr ($ str, $ prev_start, $ start-$ prev_start );
}
// Start is a negative number.
Else {
// Cut about 6 bytes forward
$ Next_start = $ strlen + $ start + $ length; // initial position
$ Next_len = $ next_start + 6 <= $ strlen? 6: $ strlen-$ next_start;
$ Next_segm = substr ($ str, $ next_start, $ next_len );
// If 1st bytes is not the first byte of the complete character, it is truncated to about 6 bytes.
$ Start = $ strlen + $ start;
$ Prev_start = $ start-6> 0? $ Start-6: 0;
$ Prev_segm = substr ($ str, $ prev_start, $ start-$ prev_start );
}
// Determine whether the first 6 bytes comply with the utf8 rule
If (preg_match ('@ ^ ([\ x80-\ xBF] {0, 5}) [\ xC0-\ xFD]? @ ', $ Next_segm, $ bytes )){
If (! Empty ($ bytes [1]) {
$ Bytes = $ bytes [1];
$ Res. = $ bytes;
}
}
// Determine whether the last 6 bytes meet the utf8 rule
$ Ord0 = ord ($ res [0]);
If (128 <= $ ord0 & 191> = $ ord0 ){
// Truncate it later and add it to the front of res.
If (preg_match ('@ [\ xC0-\ xFD] [\ x80-\ xBF] {0, 5 }$ @', $ prev_segm, $ bytes )){
If (! Empty ($ bytes [0]) {
$ Bytes = $ bytes [0];
$ Res = $ bytes. $ res;
}
}
}
Return $ res;
}
Test data ::
The code is as follows:
$ Str = 'dfjdjf test 13f test 65 & 2 data fdj (1 for mfe &...... ';
Var_dump (utf8_substr ($ str, 22, 12); echo'
';
Var_dump (utf8_substr ($ str, 22,-6); echo'
';
Var_dump (utf8_substr ($ str, 9, 12); echo'
';
Var_dump (utf8_substr ($ str, 19, 12); echo'
';
Var_dump (utf8_substr ($ str, 28,-6); echo'
';
Result: (no garbled characters are intercepted. You are welcome to test and submit a bug)
String (12) "fdj"
String (26) "fdj (1 is mfe &... "
String (13) "13f trial 65 & 2"
String (12) "Data fd"
String (20) "dj (1 is mfe &... "