PHP intercepts Chinese strings and intercepts them at the first occurrence of a given string, capturing 100 characters.
Title, using the following two ways to intercept, found that the results are not correct, please point out.
Where $word is the string that will be intercepted, $key _word for the given substring
Method One:
PHP Code
Mb_substr ($word, Strpos ($word, $key _word)/3,100, ' utf-8 ');
Method Two:
PHP Code
$start _key = Mb_strpos ($word, $key _word); $start _key = $start _key>0? $start _key:0;mb_substr ($word, $start _key,100, ' Utf-8 ');
------Solution--------------------
I found a very useful function, Mb_strimwidth ($str, 0, +, ' ', ' UTF8 '), an ' an ' character width intercept
------Solution--------------------
I really sweat, do not understand the code of the people who write out of the codes really let people have egg pain, all understand.
Remember, strstr/strpos these are for ASCII strings, that is, 1 bytes 1 byte pair, do not care about coding, for Gbk/utf8, under certain circumstances can also work normally, because Gbk/utf8 non-ASCII character of the single byte is the 7th bit 1, However, the GBK code is prone to problems because the two 2-byte characters of 1 bytes may cause an incorrect match.
The MB is the encoded function, so the number passed to him and the numbers it returns are the number of characters, not the number of bytes.
You see your first code with Strpos, if the UTF8 code is OK, the other is not to tell the truth. UTF8, you also assume that the characters are 3 bytes ... That's a mistake.
The second code is more reliable, but unfortunately mb_strpos you did not tell it encoding, this is not finished.
------Solution--------------------
mb_string function groups are not so use
Mb_internal_encoding ("Utf-8");
Mb_substr ($word, Mb_strpos ($word, $key _word), 100);
------Solution--------------------
PHP Code
String interception, all character lengths are 1,GBK, utf-8 generic. function Cut ($str, $len = n, $dot = ' ... ') { if (Mb_strlen ($str, "Utf-8") <= ($len + 1)) { $str = $str; } else { $str = mb_substr ($str, 0, $len, "Utf-8"). $dot; }