Functions that extract strings based on the utf8 encoding rules (utf8 sub_str)

Source: Internet
Author: User
The function that intercepts characters based on the utf8 encoding rules. The utf8 version of sub_str supports 1 ~ 6-byte characters are intercepted, not only for Chinese characters. The code is as follows:


/*
* Function: Similar to substr, it does not cause garbled characters.
* Parameters:
* Return value:
*/
Function utf8_substr ($ str, $ start, $ length = null ){
// The screenshot is intercepted normally first.
$ Res = substr ($ str, $ start, $ length );
$ Strlen = strlen ($ str );
/* Determine whether 6 bytes at the beginning and end are complete (not incomplete )*/
// If the start parameter is a positive number
If ($ start> = 0 ){
// Cut about 6 bytes forward
$ Next_start = $ start + $ length; // initial position
$ Next_len = $ next_start + 6 <= $ strlen? 6: $ strlen-$ next_start;
$ Next_segm = substr ($ str, $ next_start, $ next_len );
// If 1st bytes is not the first byte of the complete character, it is truncated to about 6 bytes.
$ Prev_start = $ start-6> 0? $ Start-6: 0;
$ Prev_segm = substr ($ str, $ prev_start, $ start-$ prev_start );
}
// Start is a negative number.
Else {
// Cut about 6 bytes forward
$ Next_start = $ strlen + $ start + $ length; // initial position
$ Next_len = $ next_start + 6 <= $ strlen? 6: $ strlen-$ next_start;
$ Next_segm = substr ($ str, $ next_start, $ next_len );
// If 1st bytes is not the first byte of the complete character, it is truncated to about 6 bytes.
$ Start = $ strlen + $ start;
$ Prev_start = $ start-6> 0? $ Start-6: 0;
$ Prev_segm = substr ($ str, $ prev_start, $ start-$ prev_start );
}
// Determine whether the first 6 bytes comply with the utf8 rule
If (preg_match ('@ ^ ([\ x80-\ xBF] {0, 5}) [\ xC0-\ xFD]? @ ', $ Next_segm, $ bytes )){
If (! Empty ($ bytes [1]) {
$ Bytes = $ bytes [1];
$ Res. = $ bytes;
}
}
// Determine whether the last 6 bytes meet the utf8 rule
$ Ord0 = ord ($ res [0]);
If (128 <= $ ord0 & 191> = $ ord0 ){
// Truncate it later and add it to the front of res.
If (preg_match ('@ [\ xC0-\ xFD] [\ x80-\ xBF] {0, 5 }$ @', $ prev_segm, $ bytes )){
If (! Empty ($ bytes [0]) {
$ Bytes = $ bytes [0];
$ Res = $ bytes. $ res;
}
}
}
Return $ res;
}


Test data ::

The code is as follows:


$ Str = 'dfjdjf test 13f test 65 & 2 data fdj (1 for mfe &...... ';
Var_dump (utf8_substr ($ str, 22, 12); echo'
';
Var_dump (utf8_substr ($ str, 22,-6); echo'
';
Var_dump (utf8_substr ($ str, 9, 12); echo'
';
Var_dump (utf8_substr ($ str, 19, 12); echo'
';
Var_dump (utf8_substr ($ str, 28,-6); echo'
';


Result: (no garbled characters are intercepted. You are welcome to test and submit a bug)
String (12) "fdj"
String (26) "fdj (1 is mfe &... "
String (13) "13f trial 65 & 2"
String (12) "Data fd"
String (20) "dj (1 is mfe &... "
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.