When talking about Chinese and English mixed sorting and truncation, we first think of ascii, hexadecimal, regular matching, and cyclic counting. Today, I want to share with you how to easily process strings through the mb extension of php. First, we will introduce the functions used: Mb_strwidth ($ str, $ encoding) returns the string width. $ Str string to be calculated $ Encoding used by encoding, such as utf8 and gbk Mb_strimwidth ($ str, $ start, $ width, $ tail, $ encoding) truncates a string by its width. $ Str string to be truncated $ Start: where to start the truncation. the default value is 0. $ Width: the width to be truncated. $ Tail appends the string to the string behind the truncated string, which is commonly used... $ Encoding used by encoding The following is an example:
- /**
- * Utf8 encoding format
- * One Chinese character occupies 3 bytes
- * What we want is that one Chinese character occupies 2 bytes,
- * Because the two English letters occupy the same position as one Chinese character in width
- */
- // Test string
- $ Str = 'aaaa, aaaa, aaa ';
- Echo strlen ($ str); // only use strlen to output 25 bytes
- // The encoding must be specified. Otherwise, the php internal code mb_internal_encoding () will be used to view the internal code.
- // Use mb_strwidth to output a string whose width is 20 and utf8 encoding
- Echo mb_strwidth ($ str, 'utf8 ');
- // Intercept only when the width is greater than 10
- If (mb_strwidth ($ str, 'utf8')> 10 ){
- // The screenshot starts from 0, and 10 appends are obtained. UTF-8 encoding is used.
- // Note that the append... will be calculated within the length.
- $ Str = mb_strimwidth ($ str, 0, 10, '...', 'utf8 ');
- }
- // Output aaaa at the end... 4 a counts 4 and 1. calculate 2 and 3. calculate 3. 4 + 2 + 3 = 9.
- // Isn't it easy? some people have said why 9 is not 10?
- // Because "ah" is followed by "ah", the number of Chinese characters is counted as 2. 9 + 2 = 11 has exceeded the setting, so the number of Chinese characters removed is 9.
- Echo $ str;
I will introduce some other functions below: Mb_strlen ($ str, $ encoding) returns the length of the string. $ Str string to be calculated $ Encoding used by encoding Mb_substr ($ str, $ start, $ length, $ encoding) intercepts a string $ Str string to be truncated $ Start $ Length $ Encoding used by encoding In fact, these two functions are very similar to strlen () and substr (). The only difference is that encoding can be set. Bottom edge instance:
- /**
- * Utf8 encoding format
- * One Chinese character occupies 3 bytes
- */
- $ Str = 'aa12 Ah ';
- Echo strlen ($ str); // The direct output length is 9.
- // The output length is 7. why is it 7?
- // Note that after encoding is set, each length is 1 in both Chinese and English.
- // A 1 2 AH
- // 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 7
- // Is it exactly 7 characters long?
- Echo mb_strlen ($ str, 'utf8 ');
- // The same is true for mb_substr.
- // I only want 5 characters
- Echo mb_substr ($ str, 0, 5, 'utf8'); // output aa12
In fact, there are a lot of useful functions in the mb extension. I will not list them here. If you are interested, you can view the official manual. Http://www.php.net/manual/zh/ref.mbstring.php Now, we will introduce you here today. |