PHP intercept Chinese String method summary

PHP intercept Chinese String method summary _php Tutorial

Last Update:2016-07-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Program One: Php intercept Chinese string method

Because the homepage and VTIGERCRM often in the interception of Chinese characters garbled (using substr), today found a better way to intercept Chinese strings, in this share with you.

Copy to ClipboardWhat to refer to: [www.bkjia.com]function Msubstr ($str, $start, $len) {
$tmpstr = "";
$strlen = $start + $len;
for ($i = 0; $i < $strlen; $i + +) {
if (Ord (substr ($str, $i, 1)) > 0xa0) {
$tmpstr. = substr ($str, $i, 2);
$i + +;
} else
$tmpstr. = substr ($str, $i, 1);
}
return $tmpstr;
}

Program two: PHP intercept UTF-8 string, solve half-character problem

Copy to ClipboardWhat to refer to: [www.bkjia.com]/******************************************************************
* PHP intercepts UTF-8 strings and solves half-character problems.
* English, Digital (half-width) is 1 bytes (8-bit), Chinese (full-width) is 3 bytes
* @return The string taken out, when $len is less than or equal to 0 o'clock, the entire string is returned
* @param $str Source string
* $len The length of the left substring
****************************************************************/
function Utf_substr ($STR, $len)
{
for ($i =0; $i < $len; $i + +)
{
$temp _str=substr ($str, 0, 1);
if (Ord ($temp _str) > 127)
{
$i + +;
if ($i < $len)
{
$new _str[]=substr ($str, 0, 3);
$str =substr ($STR, 3);
}
}
Else
{
$new _str[]=substr ($str, 0, 1);
$str =substr ($STR, 1);
}
}
return join ($new _str);
}
?>

PHP Utf-8 string interception

Copy to ClipboardWhat to refer to: [www.bkjia.com] function Cutstr ($string, $length) {
Preg_match_all ("/[\x01-\x7f]|[ \xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]| [\xe1-\xef] [\X80-\XBF] [\x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]| [\xf1-\xf7] [\X80-\XBF] [\X80-\XBF] [\x80-\xbf]/], $string, $info);
for ($i =0; $i
$wordscut. = $info [0][$i];
$j = Ord ($info [0][$i]) > 127? $j + 2: $j + 1;
if ($j > $length-3) {
return $wordscut. "...";
}
}
return join ("', $info [0]);
}
$string = "242432 Opposition is 456 committed to the extensive embassy place 7890";
for ($i =0; $i
{
Echo cutstr ($string, $i). "
";
}
?>

Intercept Utf-8 String Functions

In order to support multiple languages, strings in the database may be saved as UTF-8 encoding, which may be required to intercept part of the string in Web development. To avoid garbled behavior, write the following UTF-8 string intercept function

See UTF-8 FAQ for Utf-8 principles

UTF-8 encoded characters may be made up of one byte, and the exact number can be determined by the first byte. (may be longer theoretically, but this assumes no more than 3 bytes)
The first byte is greater than 224, and it is composed of a UTF-8 character with 2 bytes after it
The first byte is greater than 192 and is less than 224, and it is a UTF-8 character with 1 bytes after it
Otherwise, the first byte itself is an English character (including numbers and a small number of punctuation marks).

Code previously designed for a website (also the function that is now used for the length of the home page)

Copy to ClipboardWhat to refer to: [www.bkjia.com] $sourcestr is the string to be processed
$cutlength the length of the intercept (that is, the number of words)
function Cut_str ($SOURCESTR, $cutlength)
{
$returnstr = ";
$i = 0;
$n = 0;
$str _length=strlen ($SOURCESTR);//number of bytes in a string
while (($n < $cutlength) and ($i <= $str _length))
{
$temp _str=substr ($sourcestr, $i, 1);
$ascnum =ord ($temp _str);//Get the ASCII code of the $i character in the string
if ($ascnum >=224)//If the ASCII bit is high with 224,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 3); According to the UTF-8 encoding specification, the 3 consecutive characters are counted as a single character
$i = $i +3; The actual byte count is 3
$n + +; String length meter 1
}
ElseIf ($ascnum >=192)//If the ASCII bit is high with 192,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 2); According to the UTF-8 encoding specification, the 2 consecutive characters are counted as a single character
$i = $i +2; The actual byte count is 2
$n + +; String length meter 1
}
ElseIf ($ascnum >=65 && $ascnum <=90)//If it is uppercase,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i +1; The actual byte count still counts 1
$n + +; But considering overall aesthetics, uppercase letters are counted as a high-level character
}
else//other cases, including lowercase letters and half-width punctuation,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i +1; Actual byte count of 1
$n = $n +0.5; Lowercase letters and half-width punctuation and so on with half a high character justifies ...
}
}
if ($str _length> $cutlength) {
$returnstr = $returnstr. "...";//Add ellipses at the end of the length
}
return $returnstr;

}

Intercept Utf-8 String Functions

Copy to ClipboardWhat to refer to: [www.bkjia.com]function Fsubstr ($title, $start, $len = "", $magic =true)
{

if ($len = = "") $len =strlen ($title);

if ($start! = 0)
{
$startv = Ord (substr ($title, $start, 1));
if ($STARTV >= 128)
{
if ($startv < 192)
{
for ($i = $start-1; $i >0; $i-)
{
$TEMPV = Ord (substr ($title, $i, 1));
if ($TEMPV >= 192) break;
}
$start = $i;
}
}
}

if (strlen ($title) <= $len) return substr ($title, $start, $len);

$alen = 0;
$blen = 0;

$realnum = 0;

for ($i = $start; $i
{
$ctype = 0;
$cstep = 0;

$cur = substr ($title, $i, 1);
if ($cur = = "&")
{
if (substr ($title, $i, 4) = = "<")
{
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 4) = = ">")
{
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 5) = = "&")
{
$cstep = 5;
$length + = 5;
$i + = 4;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 6) = = "" ")
{
$cstep = 6;
$length + = 6;
$i + = 5;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (Preg_match ("/&# (\d+); /i ", substr ($title, $i, 8), $match))
{
$cstep = strlen ($match [0]);
$length + = strlen ($match [0]);
$i + = strlen ($match [0])-1;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}
}else{
if (Ord ($cur) >=252)
{
$cstep = 6;
$length + = 6;
$i + = 5;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=248) {
$cstep = 5;
$length + = 5;
$i + = 4;
$realnum + +;
if ($magic)
{
$ctype = 1;
$blen + +;
}
}elseif (Ord ($cur) >=240) {
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=224) {
$cstep = 3;
$length + = 3;
$i + = 2;
$realnum + +;
if ($magic)
{
$ctype = 1;
$blen + +;
}
}elseif (Ord ($cur) >=192) {
$cstep = 2;
$length + = 2;
$i + = 1;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=128) {
$length + = 1;
}else{
$cstep = 1;
$length +=1;
$realnum + +;
if ($magic)
{
if (Ord ($cur) >= && ord ($cur) <= 90)
{
$blen + +;
}else{
$alen + +;
}
}
}
}

if ($magic)
{
if ($blen *2+ $alen) = = ($len)) break;
if ($blen *2+ $alen) = = ($len *2+1))
{
if ($ctype = = 1)
{
$length-= $cstep;
Break
}else{
Break
}
}
}else{
if ($realnum = = $len) break;
}
}

Unset ($cur);
Unset ($alen);
Unset ($blen);
Unset ($realnum);
Unset ($ctype);
Unset ($cstep);

Return substr ($title, $start, $length);
}

http://www.bkjia.com/PHPjc/364371.html www.bkjia.com true http://www.bkjia.com/PHPjc/364371.html techarticle program One: Php intercept Chinese string method because the website homepage and VTIGERCRM often in the interception of Chinese characters garbled (using substr), today to find a better interception of Chinese ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PHP intercept Chinese String method summary _php Tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

PHP intercept Chinese String method summary _php Tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support