Program One: Php intercept Chinese string method
Because the homepage and VTIGERCRM often in the interception of Chinese characters garbled (using substr), today found a better way to intercept Chinese strings, in this share with you.
Copy to ClipboardWhat to refer to: [www.bkjia.com]function Msubstr ($str, $start, $len) {
$tmpstr = "";
$strlen = $start + $len;
for ($i = 0; $i < $strlen; $i + +) {
if (Ord (substr ($str, $i, 1)) > 0xa0) {
$tmpstr. = substr ($str, $i, 2);
$i + +;
} else
$tmpstr. = substr ($str, $i, 1);
}
return $tmpstr;
}
Program two: PHP intercept UTF-8 string, solve half-character problem
Copy to ClipboardWhat to refer to: [www.bkjia.com]/******************************************************************
* PHP intercepts UTF-8 strings and solves half-character problems.
* English, Digital (half-width) is 1 bytes (8-bit), Chinese (full-width) is 3 bytes
* @return The string taken out, when $len is less than or equal to 0 o'clock, the entire string is returned
* @param $str Source string
* $len The length of the left substring
****************************************************************/
function Utf_substr ($STR, $len)
{
for ($i =0; $i < $len; $i + +)
{
$temp _str=substr ($str, 0, 1);
if (Ord ($temp _str) > 127)
{
$i + +;
if ($i < $len)
{
$new _str[]=substr ($str, 0, 3);
$str =substr ($STR, 3);
}
}
Else
{
$new _str[]=substr ($str, 0, 1);
$str =substr ($STR, 1);
}
}
return join ($new _str);
}
?>
PHP Utf-8 string interception
Copy to ClipboardWhat to refer to: [www.bkjia.com] function Cutstr ($string, $length) {
Preg_match_all ("/[\x01-\x7f]|[ \xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]| [\xe1-\xef] [\X80-\XBF] [\x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]| [\xf1-\xf7] [\X80-\XBF] [\X80-\XBF] [\x80-\xbf]/], $string, $info);
for ($i =0; $i
$wordscut. = $info [0][$i];
$j = Ord ($info [0][$i]) > 127? $j + 2: $j + 1;
if ($j > $length-3) {
return $wordscut. "...";
}
}
return join ("', $info [0]);
}
$string = "242432 Opposition is 456 committed to the extensive embassy place 7890";
for ($i =0; $i
{
Echo cutstr ($string, $i). "
";
}
?>
Intercept Utf-8 String Functions
In order to support multiple languages, strings in the database may be saved as UTF-8 encoding, which may be required to intercept part of the string in Web development. To avoid garbled behavior, write the following UTF-8 string intercept function
See UTF-8 FAQ for Utf-8 principles
UTF-8 encoded characters may be made up of one byte, and the exact number can be determined by the first byte. (may be longer theoretically, but this assumes no more than 3 bytes)
The first byte is greater than 224, and it is composed of a UTF-8 character with 2 bytes after it
The first byte is greater than 192 and is less than 224, and it is a UTF-8 character with 1 bytes after it
Otherwise, the first byte itself is an English character (including numbers and a small number of punctuation marks).
Code previously designed for a website (also the function that is now used for the length of the home page)
Copy to ClipboardWhat to refer to: [www.bkjia.com] $sourcestr is the string to be processed
$cutlength the length of the intercept (that is, the number of words)
function Cut_str ($SOURCESTR, $cutlength)
{
$returnstr = ";
$i = 0;
$n = 0;
$str _length=strlen ($SOURCESTR);//number of bytes in a string
while (($n < $cutlength) and ($i <= $str _length))
{
$temp _str=substr ($sourcestr, $i, 1);
$ascnum =ord ($temp _str);//Get the ASCII code of the $i character in the string
if ($ascnum >=224)//If the ASCII bit is high with 224,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 3); According to the UTF-8 encoding specification, the 3 consecutive characters are counted as a single character
$i = $i +3; The actual byte count is 3
$n + +; String length meter 1
}
ElseIf ($ascnum >=192)//If the ASCII bit is high with 192,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 2); According to the UTF-8 encoding specification, the 2 consecutive characters are counted as a single character
$i = $i +2; The actual byte count is 2
$n + +; String length meter 1
}
ElseIf ($ascnum >=65 && $ascnum <=90)//If it is uppercase,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i +1; The actual byte count still counts 1
$n + +; But considering overall aesthetics, uppercase letters are counted as a high-level character
}
else//other cases, including lowercase letters and half-width punctuation,
{
$returnstr = $returnstr. substr ($sourcestr, $i, 1);
$i = $i +1; Actual byte count of 1
$n = $n +0.5; Lowercase letters and half-width punctuation and so on with half a high character justifies ...
}
}
if ($str _length> $cutlength) {
$returnstr = $returnstr. "...";//Add ellipses at the end of the length
}
return $returnstr;
}
Intercept Utf-8 String Functions
Copy to ClipboardWhat to refer to: [www.bkjia.com]function Fsubstr ($title, $start, $len = "", $magic =true)
{
if ($len = = "") $len =strlen ($title);
if ($start! = 0)
{
$startv = Ord (substr ($title, $start, 1));
if ($STARTV >= 128)
{
if ($startv < 192)
{
for ($i = $start-1; $i >0; $i-)
{
$TEMPV = Ord (substr ($title, $i, 1));
if ($TEMPV >= 192) break;
}
$start = $i;
}
}
}
if (strlen ($title) <= $len) return substr ($title, $start, $len);
$alen = 0;
$blen = 0;
$realnum = 0;
for ($i = $start; $i
{
$ctype = 0;
$cstep = 0;
$cur = substr ($title, $i, 1);
if ($cur = = "&")
{
if (substr ($title, $i, 4) = = "<")
{
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 4) = = ">")
{
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 5) = = "&")
{
$cstep = 5;
$length + = 5;
$i + = 4;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (substr ($title, $i, 6) = = "" ")
{
$cstep = 6;
$length + = 6;
$i + = 5;
$realnum + +;
if ($magic)
{
$alen + +;
}
}
else if (Preg_match ("/&# (\d+); /i ", substr ($title, $i, 8), $match))
{
$cstep = strlen ($match [0]);
$length + = strlen ($match [0]);
$i + = strlen ($match [0])-1;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}
}else{
if (Ord ($cur) >=252)
{
$cstep = 6;
$length + = 6;
$i + = 5;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=248) {
$cstep = 5;
$length + = 5;
$i + = 4;
$realnum + +;
if ($magic)
{
$ctype = 1;
$blen + +;
}
}elseif (Ord ($cur) >=240) {
$cstep = 4;
$length + = 4;
$i + = 3;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=224) {
$cstep = 3;
$length + = 3;
$i + = 2;
$realnum + +;
if ($magic)
{
$ctype = 1;
$blen + +;
}
}elseif (Ord ($cur) >=192) {
$cstep = 2;
$length + = 2;
$i + = 1;
$realnum + +;
if ($magic)
{
$blen + +;
$ctype = 1;
}
}elseif (Ord ($cur) >=128) {
$length + = 1;
}else{
$cstep = 1;
$length +=1;
$realnum + +;
if ($magic)
{
if (Ord ($cur) >= && ord ($cur) <= 90)
{
$blen + +;
}else{
$alen + +;
}
}
}
}
if ($magic)
{
if ($blen *2+ $alen) = = ($len)) break;
if ($blen *2+ $alen) = = ($len *2+1))
{
if ($ctype = = 1)
{
$length-= $cstep;
Break
}else{
Break
}
}
}else{
if ($realnum = = $len) break;
}
}
Unset ($cur);
Unset ($alen);
Unset ($blen);
Unset ($realnum);
Unset ($ctype);
Unset ($cstep);
Return substr ($title, $start, $length);
}
http://www.bkjia.com/PHPjc/364371.html www.bkjia.com true http://www.bkjia.com/PHPjc/364371.html techarticle program One: Php intercept Chinese string method because the website homepage and VTIGERCRM often in the interception of Chinese characters garbled (using substr), today to find a better interception of Chinese ...