Question about chinese character string truncation in php?

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Php Chinese string truncation question ??? Because the use of the substr () function to intercept Chinese strings will cause problems, I found a function online, as follows: PHPcode Chinese string intercept function functioncut_str ($ string, $ start, $ length) {if (strlen ($ string) & gt; $ length) {$ strnull; php Chinese string truncation question ???
Because the use of the substr () function to intercept Chinese strings may cause problems, I found a function online, as shown below:

PHP code

  // Function cut_str ($ string, $ start, $ length) {if (strlen ($ string)> $ length) {$ str = null; $ len = $ start + $ length; for ($ I = $ start; $ I <$ len; $ I ++) {if (ord (substr ($ string, $ I, 1)> 0xa0) {$ str. = substr ($ string, $ I, 2); $ I ++;} else {$ str. = substr ($ string, $ I, 1) ;}} return $ str. '... ';} else {return $ string ;}}

However, problems still occur after I use the filter and layer style. for example, I cut out "use the filter and layer style to make realistic stone words ",

PHP code

  $ Str = "use filters and layer styles to create realistic stone words"; cut_str ($ str );

However, the effect is as follows: "How to use filters and layer styles to create realistic stones ?...", Except for the characters with the same question mark, I am depressed. I checked it online. the man generally occupies 3 bytes in UTF-8 encoding, but in this function, "$ str. = substr ($ string, $ I, 2); "returns 2. what does this mean ?? I never figured it out .... If I change 2 to 3, the sentence will become "profit? Why? Why ?? Mirror? And? Figure? Mountains ?? ? Why ?? ? Why ?? Force? Why? ? Shi? Dam? The word... ", alas, it was really defeated. Which of the following heroes helped me .........

------ Solution --------------------
Why not use the mb_substr () function
------ Solution --------------------
You have to confirm your encoding. Specifies the encoding when intercepting an object.
------ Solution --------------------
It is calculated in bytes. Gbk encoding. A Chinese character is equal to 2 bytes.
------ Solution --------------------
Of course it's mb_substr. I don't know much about encoding...

Utf8 Chinese encoding 2-3 characters is very common, but the single-byte non-ASCII characters must be 1-7th characters, which does not conflict with the single-byte ASCII, and the GBK code is similar.

Use mb_substr, which automatically identifies multi-byte characters based on the utf8 encoding range.
------ Solution --------------------
This function is only applicable to gbk encoding.

Discussion

Haha, I studied the manual and just got it done. you just said, just confirm the encoding, but I want to know why that function is not working. why ?? That seems to be the answer to the php interview. Can the predecessors give us some advice, especially the 2 character in UTF-8, which is a string of 3 to 4 characters and commonly used to contain 3 characters ..... Trouble

------ Solution --------------------

PHP code

/***************************** SubCNchar () trash Chinese characters ** [$ str] [string to be truncated] * [$ start] [starting position of the trash] * [$ length] [length to be truncated] * [$ charset] [string encoding] ***************************/function subCNchar ($ str, $ start = 0, $ length, $ charset = "UTF-8") {if (strlen ($ str) <= $ length) return $ str; $ re ['utf-8'] = "/[\ x01-\ x7f] | [\ xc2-\ xdf] [\ x80-\ xbf] | [\ xe0 -\ xef] [\ x80-\ xbf] {2} | [\ xf0-\ xff] [\ x80-\ xbf] {3 }/"; $ re ['gb2312'] = "/[\ x01-\ x7f] | [\ xb0-\ xf7] [\ xa0-\ xfe]/"; $ re ['gbk'] = "/[\ x01-\ x7f] | [\ x81-\ xfe] [\ x40-\ xfe]/"; $ re ['big5'] = "/[\ x01-\ x7f] | [\ x81-\ xfe] ([\ x40-\ x7e] | \ xa1-\ xfe]) /"; preg_match_all ($ re [$ charset], $ str, $ match); $ slice = join (" ", array_slice ($ match [0], $ start, $ length); return $ slice ;}
------ Solution --------------------
Why can't I add .....
Echo mb_strlen ($ str, 'utf-8')> 10? Mb_substr ($ str, 'utf-8'). '...': $ str;
------ Solution --------------------
Add "..." to the 12th floor,

If you have to change this function, the UTF-8 encoding is quite regular, except for the ascii code,
The first byte starts with 11. the number of consecutive 1 represents the total number of bytes, and the subsequent bytes start with 10.
The Chinese characters are basically in the three-byte zone.
Knowing this rule, it is easy to write a function?
U + 007F 0 xxxxxxx
U + 07FF 110 xxxxx 10 xxxxxx
U + FFFF 1110 xxxx 10 xxxxxx 10 xxxxxx
U + 1 FFFFF 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U + 3 FFFFFF 111110xx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U + 7 FFFFFFF 1111110x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx




  
   Discussion
   

   
Can this function be changed to UTF-8 ?? Mb_substr () does not seem to be able to add "..." to the end of a character that has not been completed. this affects the effect and solves the problem.
   

  
------ Solution --------------------
You can use the mb_strimwidth function

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Question about chinese character string truncation in php?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Question about chinese character string truncation in php?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support