PHP intercepts Chinese strings and intercepts 100 characters at the first occurrence of a given string

Last Update:2016-06-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PHP intercepts Chinese strings and intercepts them at the first occurrence of a given string, capturing 100 characters.
Title, using the following two ways to intercept, found that the results are not correct, please point out.
Where $word is the string that will be intercepted, $key _word for the given substring
Method One:

PHP Code

  
   Mb_substr ($word, Strpos ($word, $key _word)/3,100, ' utf-8 ');

Method Two:

PHP Code

  
   $start _key = Mb_strpos ($word, $key _word); $start _key = $start _key>0? $start _key:0;mb_substr ($word, $start _key,100, ' Utf-8 ');

------Solution--------------------
I found a very useful function, Mb_strimwidth ($str, 0, +, ' ', ' UTF8 '), an ' an ' character width intercept
------Solution--------------------
I really sweat, do not understand the code of the people who write out of the codes really let people have egg pain, all understand.

Remember, strstr/strpos these are for ASCII strings, that is, 1 bytes 1 byte pair, do not care about coding, for Gbk/utf8, under certain circumstances can also work normally, because Gbk/utf8 non-ASCII character of the single byte is the 7th bit 1, However, the GBK code is prone to problems because the two 2-byte characters of 1 bytes may cause an incorrect match.

The MB is the encoded function, so the number passed to him and the numbers it returns are the number of characters, not the number of bytes.

You see your first code with Strpos, if the UTF8 code is OK, the other is not to tell the truth. UTF8, you also assume that the characters are 3 bytes ... That's a mistake.

The second code is more reliable, but unfortunately mb_strpos you did not tell it encoding, this is not finished.

------Solution--------------------
mb_string function groups are not so use

Mb_internal_encoding ("Utf-8");
Mb_substr ($word, Mb_strpos ($word, $key _word), 100);
------Solution--------------------

PHP Code

String interception, all character lengths are 1,GBK, utf-8 generic.  function Cut ($str, $len = n, $dot = ' ... ') {    if (Mb_strlen ($str, "Utf-8") <= ($len + 1)) {        $str = $str;    } else {        $str = mb_substr ($str, 0, $len, "Utf-8"). $dot;    }    

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PHP intercepts Chinese strings and intercepts 100 characters at the first occurrence of a given string

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support