Compare Discuz and Ecshop intercept string functions PHP version _php Tutorial

Source: Internet
Author: User
Tags ord truncated
The following gives the source code of the two versions of the function and the simple test, and finally I will give a more practical function of the string interception. It is important to note that the string interception problem discussed here is for UTF-8 encoded Chinese strings.
Discuz version
Copy CodeThe code is as follows:
/**
* [Discuz] based on PHP does not install MB_SUBSTR, such as extension intercept string, if the interception of the text is calculated as 2 characters
* @param $string the string to intercept
* @param $length number of characters to intercept
* @param $dot Replace the end string of the truncated section
* @return returns the truncated string
*/
function Cutstr ($string, $length, $dot = ' ... ') {
If the string is less than the length to intercept, return directly
Using strlen here to get string lengths is a big drawback, such as the string "Happy New Year" to intercept 4 Chinese characters,
Then you must know the number of bytes of these 4 Chinese characters, otherwise the returned string may be "Happy New Year ..."
if (strlen ($string) <= $length) {
return $string;
}
Convert the original string to Htmlspecialchars
$pre = Chr (1);
$end = Chr (1);
$string = str_replace (Array (' & ', ' ' ', ' < ', ' > '), Array ($pre. ') & '. $end, $pre. '"' . $end, $pre. ' < '. $end, $pre. ' > '. $end), $string);
$strcut = "; Initialize return value
If it's utf-8 code (this is a bit incomplete, it could be UTF8)
if (Strtolower (CHARSET) = = ' Utf-8 ') {
The initial continuous loop pointer $n, the last word bit number $tn, the number of characters intercepted $NOC
$n = $tn = $noc = 0;
while ($n < strlen ($string)) {
$t = Ord ($string [$n]);
if ($t = = 9 | | $t = = 10 | | (<= $t && $t <= 126)) {
If the English half-width symbol, and so on, $n pointer back 1-bit, $tn the last word is 1-bit
$tn = 1;
$n + +;
$noc + +;
} elseif (194 <= $t && $t <= 223) {
If it is a two-byte character $n The pointer moves back 2 bits, $tn the last word is 2 bits
$tn = 2;
$n + = 2;
$noc + = 2;
} elseif (224 <= $t && $t <= 239) {
If it is three bytes (which can be understood as medium term), $n move back 3 bits, $tn the last word is 3 bits
$tn = 3;
$n + = 3;
$noc + = 2;
} elseif (<= $t && $t <= 247) {
$tn = 4;
$n + = 4;
$noc + = 2;
} elseif (248 <= $t && $t <= 251) {
$tn = 5;
$n + = 5;
$noc + = 2;
} elseif ($t = = 252 | | $t = = 253) {
$tn = 6;
$n + = 6;
$noc + = 2;
} else {
$n + +;
}
Jump out of a continuous loop if you exceed the number you want to take
if ($noc >= $length) {
Break
}
}
This is the place where the last word is removed for added $dot.
if ($noc > $length) {
$n-= $tn;
}
$strcut = substr ($string, 0, $n);
} else {
Not UTF-8 encoded full-width is shifted back 2 bits
for ($i = 0; $i < $length; $i + +) {
$strcut. = Ord ($string [$i]) > 127? $string [$i]. $string [+ + $i]: $string [$i];
}
}
and restore the original htmlspecialchars.
$strcut = Str_replace (Array ($pre. ' & '. $end, $pre. '"' . $end, $pre. ' < '. $end, $pre. ' > '. $end), Array (' & ', ' "', ' < ', ' > '), $strcut);
$pos = Strrpos ($strcut, Chr (1));
if ($pos!== false) {
$strcut = substr ($strcut, 0, $pos);
}
Return $strcut. $dot; Finally, the interception plus the $dot output
}

The biggest flaw in the Discuz version is the use of strlen to get the length of the original string and to compare it to the incoming length parameter (number of bytes), because the number of bytes in the UTF-8 Chinese character is not fixed, So it's a dilemma: how much intercept length should you specify if you want to intercept 4 Chinese characters? 8 bytes or 12 bytes? This is not predictable, and precisely because of this problem discuz cutstr is actually a bug, through the following test results can be seen:
Copy CodeThe code is as follows:
$str 1 = "For the purpose of the poor Trinidad";
Echo My_cutstr ($str 1, 10, "..."). " \ n "; Output: Want to poor Trinidad ... [This is a bug, think about what causes it?] ]
Echo My_cutstr ($str 1, 15, "..."). " \ n "; Output: Want to be poor Trinidad

The cause of the above bug is the CUTSTR function in the interception of characters is a Chinese character by 2 characters, then 5 Chinese characters is 10 characters, and the original string length is 15 bytes, so Cutstr think "successfully" from the 15-character string to intercept 10 characters, and then added " Tail. " To resolve this bug, just determine if the returned substring is the same as the original string, if the same is not added "tail".
version Ecshop
Copy CodeThe code is as follows:
/**
* [Ecshop] based on PHP mb_substr,iconv_substr these two extensions to intercept the string, Chinese characters are 1 characters in length calculation;
* This function is only applicable to UTF-8 encoded Chinese strings.
*
* @param $str Original string
* Number of characters intercepted @param $length
* @param $append Replace the end string of the truncated section
* @return returns the truncated string
*/
function Sub_str ($str, $length = 0, $append = ' ... ') {
$str = Trim ($STR);
$strlength = strlen ($STR);
if ($length = = 0 | | $length >= $strlength) {
return $str;
} elseif ($length < 0) {
$length = $strlength + $length;
if ($length < 0) {
$length = $strlength;
}
}
if (function_exists (' mb_substr ')) {
$newstr = mb_substr ($str, 0, $length, ' utf-8 ');
} elseif (Function_exists (' iconv_substr ')) {
$newstr = iconv_substr ($str, 0, $length, ' utf-8 ');
} else {
$newstr = Trim_right (substr ($str, 0, $length));
$newstr = substr ($str, 0, $length);
}
if ($append && $str! = $newstr) {
$newstr. = $append;
}
return $newstr;
}

Ecshop version of the features and disadvantages are to count as a character, if the original string does not contain Chinese, such as: abcd1234, if the intention is to intercept 4 Chinese characters or 8 English characters, then the use of Ecshop version will not get the desired result, the return value is: ABCD. Here are the simple test results:
Copy CodeThe code is as follows:
$str 1 = "The day depends on the mountain, the Yellow River into the ocean";
echo $str 1. " \ n ";
Echo My_sub_str ($str 1, 4, "..."). " \ n "; Output: Daytime mountain ...
$str 2 = "White 1st 2 3 Mountain 4";
echo $str 2. " \ n ";
Echo My_sub_str ($str 2, 4, "..."). " \ n "; Output: White 1st 2 ...

optimized version
The majority of the application of the Chinese string is "the original string can be Chinese, English, digital mixed, in the text in 2 characters, the English number by 1 characters," according to the requirement below gives an implementation version:
Copy CodeThe code is as follows:
/**
* String intercept, Chinese characters are calculated in 2 character, support GBK and UTF-8 encoding
* @param $string the string to intercept
* @param $length number of characters to intercept
* @param the tail $append added to the substring
* @return returns the truncated string
*/
function substring ($string, $length, $append = False) {
if ($length <= 0) {
Return ';
}
Detects if the original string is UTF-8 encoded
$is _utf8 = false;
$str 1 = @iconv ("UTF-8", "GBK", $string);
$str 2 = @iconv ("GBK", "UTF-8", $str 1);
if ($string = = $str 2) {
$is _utf8 = true;
If the UTF-8 encoding is used, the GBK encoded
$string = $str 1;
}
$newstr = ";
for ($i = 0; $i < $length; $i + +) {
$newstr. = Ord ($string [$i]) > 127? $string [$i]. $string [+ + $i]: $string [$i];
}
if ($is _utf8) {
$newstr = @iconv ("GBK", "UTF-8", $newstr);
}
if ($append && $newstr! = $string) {
$newstr. = $append;
}
return $newstr;
}

The test results are shown in the following (GBK and UTF-8 results are consistent):
Copy CodeThe code is as follows:
$str 1 = "The day depends on the mountain, the Yellow River into the ocean";
echo substring ($str 1, 4, "..."). " \ n "; Output: Daytime ...
echo substring ($str 1, 5, "..."). " \ n "; Output: Daylight ...
$str 2 = "12 white 34 days 56 according to 78 Mountain";
echo substring ($str 2, 4, "..."). " \ n "; Output: 12 white ...
echo substring ($str 2, 5, "..."). " \ n "; Output: 12 White 3 ...

Author: edwardlost ' blog

http://www.bkjia.com/PHPjc/325891.html www.bkjia.com true http://www.bkjia.com/PHPjc/325891.html techarticle The following gives the source code of the two versions of the function and the simple test, and finally I will give a more practical function of the string interception. Note that the string to be discussed here is truncated ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.