When talking about Chinese and English mixed sorting and truncation, we first think of ascii, hexadecimal, regular matching, and cyclic counting.
Today, I want to share with you how to easily process strings through the mb extension of php.
First, we will introduce the functions used:
Mb_strwidth ($ str, $ encoding) returns the string width.
$ Str string to be calculated
$ Encoding used by encoding, such as utf8 and gbk
Mb_strimwidth ($ str, $ start, $ width, $ tail, $ encoding) truncates a string by its width.
$ Str string to be truncated
$ Start: Where to start the truncation. The default value is 0.
$ Width: the width to be truncated.
$ Tail appends the string to the string behind the truncated string, which is commonly used...
$ Encoding used by encoding
The following is an example:
<? Php/*** utf8 encoding format * 1 Chinese Character occupies 3 bytes * What we want is that 1 Chinese Character occupies 2 bytes, * From the aspect of width, the positions occupied by the two English letters are equivalent to one Chinese character * // test string $ str = 'aaaa, aaaa, aaa '; echo strlen ($ str); // only use strlen to output 25 bytes. // The encoding must be specified. Otherwise, the php internal code mb_internal_encoding () will be used () you can check the internal code // use mb_strwidth to output the string width to 20 using utf8 encoding echo mb_strwidth ($ str, 'utf8 '); // if (mb_strwidth ($ str, 'utf8')> 10) is intercepted only when the width is greater than 10. {// The value starts from 0 and 10 are appended ..., use utf8 encoding // note the append... it is also calculated to be within the length $ str = mb_strimwidth ($ str, 0, 10 ,'... ', 'utf8');} // finally output aaaa... 4 a, 4, 1, 2, 3, 3, 4, 2, 3, 3, 4, 2, 3, = 9 // is it easy, some people have said why 9, not 10? // Because "ah" is followed by "ah", the number of Chinese characters is counted as 2. 9 + 2 = 11 has exceeded the setting. Therefore, if one character is removed, the value of echo $ str is 9;
I will introduce some other functions below:
Mb_strlen ($ str, $ encoding) returns the length of the string.
$ Str string to be calculated
$ Encoding used by encoding
Mb_substr ($ str, $ start, $ length, $ encoding) intercepts a string
$ Str string to be truncated
$ Start
$ Length
$ Encoding used by encoding
In fact, these two functions are very similar to strlen () and substr (). The only difference is that encoding can be set.
Bottom edge instance:
<? Php/*** utf8 encoding format * 1 Chinese Character occupies 3 bytes */$ str = 'aa12 Ah a'; echo strlen ($ str ); // The direct output length is 9 // The output length is 7. Why is it 7? // Note that after encoding is set, whether it is Chinese or English, each length is 1 // a 1 2 Ah a // 1 + 1 + 1 + 1 + 1 + 1 + 1 = 7 // is it exactly 7? echo mb_strlen ($ str, 'utf8'); // The same is true for the same mb_substr // I only want 5 characters to echo mb_substr ($ str, 0, 5, 'utf8 '); // output aa12
In fact, there are a lot of useful functions in the mb extension. I will not list them here.
If you are interested, you can view the official manual.
Http://www.php.net/manual/zh/ref.mbstring.php
Now, we will introduce you here today.