Because of the need, want to implement "PHP interception, such as Long UFT8 in English mixed string", but the internet to find a lot of code is not garbled or can not achieve equal length (in a Chinese length, two English letters count a length, such as ' equal length ' length is 2, ' UTF8 ' length is 2).
Because UTF8 encoding, Chinese is three bytes, English is a byte, with substr will appear garbled, with MB_SUBSTR will appear above the unequal length problem, but will not have garbled;
I am operating in bytes, a simple implementation of a small program.
Only in UTF8 encoding is used.
PHP code
/*UTF8 encoding to intercept equal length Chinese and English strings*/ //English punctuation [., \ "\\?!:_ ']<?functionSubstr_utf8 ($string,$start,$length) { //by Aiou $chars=$string; //echo $string [0]. $string [1]. $string [2]; $i=0; Do{ if(Preg_match("/[0-9a-za-z]/",$chars[$i])){//Pure English $m++; } Else{$n++; }//non-English bytes, $k=$n/3+$m/2; $l=$n/3+$m;//The final intercept length; $l = $n/3+ $m $i++; } while($k<$length); $str 1= Mb_substr ($string,$start,$l, ' Utf-8 ');//ensure that no garbled characters are present return $str 1; }
Test results:
PHP code
$string = ' first intercept, MB_SUBSTR returns the string width is calculated by ' word '; $string 1 = ' first intercept, return the string width is calculated by ' word '; $string 2 = ' A A D intercept, the 12345 returned is the string width is calculated by the word ';
1.
PHP code
EchoSubstr_utf8 ($string, 0, 1). ' <br/> '; EchoSubstr_utf8 ($string, 0,2). ' <br/> '; EchoSubstr_utf8 ($string, 0, 3). ' <br/> '; EchoSubstr_utf8 ($string, 0,4). ' <br/> '; EchoSubstr_utf8 ($string, 0,5). ' <br/> '; EchoSubstr_utf8 ($string, 0,6). ' <br/> '; EchoSubstr_utf8 ($string, 0,7). ' <br/> '; EchoSubstr_utf8 ($string, 0,8). ' <br/> '; EchoSubstr_utf8 ($string, 0,9). ' <br/> '; EchoSubstr_utf8 ($string, 0,10). ' <br/> '; EchoSubstr_utf8 ($string, 0,11). ' <br/> '; EchoSubstr_utf8 ($string, 0,12). ' <br/> '; EchoSubstr_utf8 ($string, 0,13). ' <br/> '; EchoSubstr_utf8 ($string, 0,14). ' <br/> '; EchoSubstr_utf8 ($string, 0,15). ' <br/> '; EchoSubstr_utf8 ($string, 0,16). ' <br/> '; EchoSubstr_utf8 ($string, 0,17). ' <br/> '; EchoSubstr_utf8 ($string, 0,18). ' <br/> '; EchoSubstr_utf8 ($string, 0,19). ' <br/> '; EchoSubstr_utf8 ($string, 0,20). ' <br/> ';
The
First
First time
First time cut
First time Intercept
First interception,
First time interception, MB
First Intercept, mb_s.
First Intercept, Mb_sub.
First Intercept, Mb_subst.
First Intercept, Mb_substr.
First interception, Mb_substr return
First Intercept, MB_SUBSTR return
First Intercept, MB_SUBSTR return.
For the first intercept, MB_SUBSTR returns
The first intercept, the MB_SUBSTR return is the word
First Intercept, MB_SUBSTR returns the character
The first intercept, MB_SUBSTR returns a string
The first intercept, MB_SUBSTR returns the string width
The first intercept, MB_SUBSTR returns the string width
2.
Java code
$ss = ' 1234567890abcdefghijklmnopqrst '; Echo Utf8helper::substr_utf8 ($ss,0,1). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,2). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,3). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,4). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,5). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,6). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,7). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,8). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,9). ' <br/> '; Echo Utf8helper::substr_utf8 ($ss,0,10);
12
1234
123456
12345678
1234567890
1234567890ab
1234567890abcd
1234567890abcdef
1234567890abcdefgh
1234567890abcdefghij
The length is based on the number of Chinese characters.
Basically every two English letters, numbers, English punctuation count as a Chinese character length. It seems to be a good effect.
Improvement can also be done under other coding.
Efficiency did not test, there is no such concept.