PHP intercept Chinese String method summary _php Tutorial

Source: Internet
Author: User
Program One: Php intercept Chinese string method

Because the homepage and VTIGERCRM often in the interception of Chinese characters garbled (using substr), today found a better way to intercept Chinese strings, in this share with you.

Copy to ClipboardWhat to refer to: [www.bkjia.com]function Msubstr ($str, $start, $len) {
$tmpstr = "";
$strlen = $start + $len;
for ($i = 0; $i < $strlen; $i + +) {
if (Ord (substr ($str, $i, 1)) > 0xa0) {
$tmpstr. = substr ($str, $i, 2);
$i + +;
} else
$tmpstr. = substr ($str, $i, 1);
}
return $tmpstr;
}

Program two: PHP intercept UTF-8 string, solve half-character problem

Copy to ClipboardWhat to refer to: [www.bkjia.com]/******************************************************************
* PHP intercepts UTF-8 strings and solves half-character problems.
* English, Digital (half-width) is 1 bytes (8-bit), Chinese (full-width) is 3 bytes
* @return The string taken out, when $len is less than or equal to 0 o'clock, the entire string is returned
* @param $str Source string
* $len The length of the left substring
****************************************************************/
function Utf_substr ($STR, $len)
{
for ($i =0; $i < $len; $i + +)
{
$temp _str=substr ($str, 0, 1);
if (Ord ($temp _str) > 127)
{
$i + +;
if ($i < $len)
{
$new _str[]=substr ($str, 0, 3);
$str =substr ($STR, 3);
}
}
Else
{
$new _str[]=substr ($str, 0, 1);
$str =substr ($STR, 1);
}
}
return join ($new _str);
}
?>

PHP Utf-8 string interception

Copy to ClipboardWhat to refer to: [www.bkjia.com] function Cutstr ($string, $length) {
Preg_match_all ("/[\x01-\x7f]|[ \xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]| [\xe1-\xef] [\X80-\XBF] [\x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]| [\xf1-\xf7] [\X80-\XBF] [\X80-\XBF] [\x80-\xbf]/], $string, $info);
for ($i =0; $i
$wordscut. = $info [0][$i];
$j = Ord ($info [0][$i]) > 127? $j + 2: $j + 1;
if ($j > $length-3) {
return $wordscut. "...";
}
}
return join ("', $info [0]);
}
$string = "242432 Opposition is 456 committed to the extensive embassy place 7890";
for ($i =0; $i
{
Echo cutstr ($string, $i). "
";
}
?>

Intercept Utf-8 String Functions

In order to support multiple languages, strings in the database may be saved as UTF-8 encoding, which may be required to intercept part of the string in Web development. To avoid garbled behavior, write the following UTF-8 string intercept function

See UTF-8 FAQ for Utf-8 principles

UTF-8 encoded characters may be made up of one byte, and the exact number can be determined by the first byte. (may be longer theoretically, but this assumes no more than 3 bytes)
The first byte is greater than 224, and it is composed of a UTF-8 character with 2 bytes after it
The first byte is greater than 192 and is less than 224, and it is a UTF-8 character with 1 bytes after it
Otherwise, the first byte itself is an English character (including numbers and a small number of punctuation marks).

Code previously designed for a website (also the function that is now used for the length of the home page)

Copy to ClipboardWhat to refer to: [www.bkjia.com] $sourcestr is the string to be processed
$cutlength the length of the intercept (that is, the number of words)
function Cut_str ($SOURCESTR, $cutlength)
{
$returnstr = ";
$i = 0;
$n = 0;
$str _length=strlen ($SOURCESTR);//number of bytes in a string
while (($n < $cutlength) and ($i <= $str _length))
{
$temp _str=substr ($sourcestr, $i, 1);
$ascnum =ord ($temp _str);//Get the ASCII code of the $i character in the string
if ($ascnum >=224)//If the ASCII bit is high with 224,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 3); According to the UTF-8 encoding specification, the 3 consecutive characters are counted as a single character
$i = $i +3; The actual byte count is 3
$n + +; String length meter 1
}
ElseIf ($ascnum >=192)//If the ASCII bit is high with 192,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 2); According to the UTF-8 encoding specification, the 2 consecutive characters are counted as a single character
$i = $i +2; The actual byte count is 2
$n + +; String length meter 1
}
ElseIf ($ascnum >=65 && $ascnum <=90)//If it is uppercase,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i +1; The actual byte count still counts 1
$n + +; But considering overall aesthetics, uppercase letters are counted as a high-level character
}
else//other cases, including lowercase letters and half-width punctuation,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i +1; Actual byte count of 1
$n = $n +0.5; Lowercase letters and half-width punctuation and so on with half a high character justifies ...
}
}
if ($str _length> $cutlength) {
$returnstr = $returnstr. "...";//Add ellipses at the end of the length
}
return $returnstr;

}

Intercept Utf-8 String Functions

Copy to ClipboardWhat to refer to: [www.bkjia.com]function Fsubstr ($title, $start, $len = "", $magic =true)
{

if ($len = = "") $len =strlen ($title);

if ($start! = 0)
{
$startv = Ord (substr ($title, $start, 1));
if ($STARTV >= 128)
{
if ($startv < 192)
{
for ($i = $start-1; $i >0; $i-)
{
$TEMPV = Ord (substr ($title, $i, 1));
if ($TEMPV >= 192) break;
}
$start = $i;
}
}
}

if (strlen ($title) <= $len) return substr ($title, $start, $len);

$alen = 0;
$blen = 0;

$realnum = 0;

for ($i = $start; $i
{
$ctype = 0;
$cstep = 0;

$cur = substr ($title, $i, 1);
if ($cur = = "&")
{
if (substr ($title, $i, 4) = = "<")
{
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 4) = = ">")
{
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 5) = = "&")
{
$cstep = 5;
$length + = 5;
$i + = 4;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 6) = = "" ")
{
$cstep = 6;
$length + = 6;
$i + = 5;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (Preg_match ("/&# (\d+); /i ", substr ($title, $i, 8), $match))
{
$cstep = strlen ($match [0]);
$length + = strlen ($match [0]);
$i + = strlen ($match [0])-1;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}
}else{
if (Ord ($cur) >=252)
{
$cstep = 6;
$length + = 6;
$i + = 5;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=248) {
$cstep = 5;
$length + = 5;
$i + = 4;
$realnum + +;
if ($magic)
{
$ctype = 1;
$blen + +;
}
}elseif (Ord ($cur) >=240) {
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=224) {
$cstep = 3;
$length + = 3;
$i + = 2;
$realnum + +;
if ($magic)
{
$ctype = 1;
$blen + +;
}
}elseif (Ord ($cur) >=192) {
$cstep = 2;
$length + = 2;
$i + = 1;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=128) {
$length + = 1;
}else{
$cstep = 1;
$length +=1;
$realnum + +;
if ($magic)
{
if (Ord ($cur) >= && ord ($cur) <= 90)
{
$blen + +;
}else{
$alen + +;
}
}
}
}

if ($magic)
{
if ($blen *2+ $alen) = = ($len)) break;
if ($blen *2+ $alen) = = ($len *2+1))
{
if ($ctype = = 1)
{
$length-= $cstep;
Break
}else{
Break
}
}
}else{
if ($realnum = = $len) break;
}
}

Unset ($cur);
Unset ($alen);
Unset ($blen);
Unset ($realnum);
Unset ($ctype);
Unset ($cstep);

Return substr ($title, $start, $length);
}

http://www.bkjia.com/PHPjc/364371.html www.bkjia.com true http://www.bkjia.com/PHPjc/364371.html techarticle program One: Php intercept Chinese string method because the website homepage and VTIGERCRM often in the interception of Chinese characters garbled (using substr), today to find a better interception of Chinese ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.