Php Chinese string truncation question ??? Because the use of the substr () function to intercept Chinese strings will cause problems, I found a function online, as follows: PHPcode Chinese string intercept function functioncut_str ($ string, $ start, $ length) {if (strlen ($ string) & gt; $ length) {$ strnull; php Chinese string truncation question ???
Because the use of the substr () function to intercept Chinese strings may cause problems, I found a function online, as shown below:
PHP code
// Function cut_str ($ string, $ start, $ length) {if (strlen ($ string)> $ length) {$ str = null; $ len = $ start + $ length; for ($ I = $ start; $ I <$ len; $ I ++) {if (ord (substr ($ string, $ I, 1)> 0xa0) {$ str. = substr ($ string, $ I, 2); $ I ++;} else {$ str. = substr ($ string, $ I, 1) ;}} return $ str. '... ';} else {return $ string ;}}
However, problems still occur after I use the filter and layer style. for example, I cut out "use the filter and layer style to make realistic stone words ",
PHP code
$ Str = "use filters and layer styles to create realistic stone words"; cut_str ($ str );
However, the effect is as follows: "How to use filters and layer styles to create realistic stones ?...", Except for the characters with the same question mark, I am depressed. I checked it online. the man generally occupies 3 bytes in UTF-8 encoding, but in this function, "$ str. = substr ($ string, $ I, 2); "returns 2. what does this mean ?? I never figured it out .... If I change 2 to 3, the sentence will become "profit? Why? Why ?? Mirror? And? Figure? Mountains ?? ? Why ?? ? Why ?? Force? Why? ? Shi? Dam? The word... ", alas, it was really defeated. Which of the following heroes helped me .........
------ Solution --------------------
Why not use the mb_substr () function
------ Solution --------------------
You have to confirm your encoding. Specifies the encoding when intercepting an object.
------ Solution --------------------
It is calculated in bytes. Gbk encoding. A Chinese character is equal to 2 bytes.
------ Solution --------------------
Of course it's mb_substr. I don't know much about encoding...
Utf8 Chinese encoding 2-3 characters is very common, but the single-byte non-ASCII characters must be 1-7th characters, which does not conflict with the single-byte ASCII, and the GBK code is similar.
Use mb_substr, which automatically identifies multi-byte characters based on the utf8 encoding range.
------ Solution --------------------
This function is only applicable to gbk encoding.
Discussion
Haha, I studied the manual and just got it done. you just said, just confirm the encoding, but I want to know why that function is not working. why ?? That seems to be the answer to the php interview. Can the predecessors give us some advice, especially the 2 character in UTF-8, which is a string of 3 to 4 characters and commonly used to contain 3 characters ..... Trouble
------ Solution --------------------
PHP code
/***************************** SubCNchar () trash Chinese characters ** [$ str] [string to be truncated] * [$ start] [starting position of the trash] * [$ length] [length to be truncated] * [$ charset] [string encoding] ***************************/function subCNchar ($ str, $ start = 0, $ length, $ charset = "UTF-8") {if (strlen ($ str) <= $ length) return $ str; $ re ['utf-8'] = "/[\ x01-\ x7f] | [\ xc2-\ xdf] [\ x80-\ xbf] | [\ xe0 -\ xef] [\ x80-\ xbf] {2} | [\ xf0-\ xff] [\ x80-\ xbf] {3 }/"; $ re ['gb2312'] = "/[\ x01-\ x7f] | [\ xb0-\ xf7] [\ xa0-\ xfe]/"; $ re ['gbk'] = "/[\ x01-\ x7f] | [\ x81-\ xfe] [\ x40-\ xfe]/"; $ re ['big5'] = "/[\ x01-\ x7f] | [\ x81-\ xfe] ([\ x40-\ x7e] | \ xa1-\ xfe]) /"; preg_match_all ($ re [$ charset], $ str, $ match); $ slice = join (" ", array_slice ($ match [0], $ start, $ length); return $ slice ;}
------ Solution --------------------
Why can't I add .....
Echo mb_strlen ($ str, 'utf-8')> 10? Mb_substr ($ str, 'utf-8'). '...': $ str;
------ Solution --------------------
Add "..." to the 12th floor,
If you have to change this function, the UTF-8 encoding is quite regular, except for the ascii code,
The first byte starts with 11. the number of consecutive 1 represents the total number of bytes, and the subsequent bytes start with 10.
The Chinese characters are basically in the three-byte zone.
Knowing this rule, it is easy to write a function?
U + 007F 0 xxxxxxx
U + 07FF 110 xxxxx 10 xxxxxx
U + FFFF 1110 xxxx 10 xxxxxx 10 xxxxxx
U + 1 FFFFF 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U + 3 FFFFFF 111110xx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U + 7 FFFFFFF 1111110x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
Discussion
Can this function be changed to UTF-8 ?? Mb_substr () does not seem to be able to add "..." to the end of a character that has not been completed. this affects the effect and solves the problem.
------ Solution --------------------
You can use the mb_strimwidth function