Compare Discuz and Ecshop intercept string functions PHP version _php tips

Source: Internet
Author: User
Tags chr ord strlen truncated
Here are two versions of the function of the source code and a simple test, and finally I will give a more practical string intercept function. It should be noted that the string interception problem discussed here is the Chinese character string for UTF-8 encoding.
Discuz version
Copy Code code as follows:

/**
* [Discuz] based on PHP does not install MB_SUBSTR, such as extension intercept strings, if the interception of text is 2 characters
* @param $string the string to intercept
* @param $length The number of characters to intercept
* @param $dot Replace the end string of the truncated part
* @return returns the intercepted string
*/
function Cutstr ($string, $length, $dot = ' ... ') {
If the string is less than the length to intercept, it returns directly
There is a great drawback to using strlen to get string lengths, such as "Happy New Year" for strings, to intercept 4 Chinese characters,
Then you must know the number of bytes in these 4 Chinese characters, otherwise the returned string might be "Happy New Year ..."
if (strlen ($string) <= $length) {
return $string;
}
Convert Htmlspecialchars in original string
$pre = Chr (1);
$end = Chr (1);
$string = str_replace (Array (' & ', ' "'", ' < ', ' > '), Array ($pre.) & '. $end, $pre. '"' . $end, $pre. ' < '. $end, $pre. ' > '. $end), $string);
$strcut = '; Initialize return value
If it's utf-8 code (this is a little incomplete, it could be UTF8)
if (Strtolower (CHARSET) = = ' Utf-8 ') {
The initial continuous loop pointer $n, the last number of digits $tn, the number of characters intercepted $NOC
$n = $tn = $noc = 0;
while ($n < strlen ($string)) {
$t = Ord ($string [$n]);
if ($t = = 9 | | $t = 10 | | (<= $t && $t <= 126)) {
In the case of English half-width symbols, $n pointer moves back 1 digits, $tn the last word is 1 digits
$tn = 1;
$n + +;
$noc + +;
} elseif (194 <= $t && $t <= 223) {
If the two-byte character $n the pointer and then moves 2 digits, $tn the last word is 2 bits
$tn = 2;
$n + 2;
$noc + 2;
} elseif (224 <= $t && $t <= 239) {
If it is three bytes (which can be understood as the Chinese word), the $n moves back 3 digits, $tn the last word is 3 bits
$tn = 3;
$n + 3;
$noc + 2;
ElseIf (<= $t && $t <= 247) {
$tn = 4;
$n + 4;
$noc + 2;
} elseif (248 <= $t && $t <= 251) {
$tn = 5;
$n + 5;
$noc + 2;
} elseif ($t = = 252 | | $t = = 253) {
$tn = 6;
$n + 6;
$noc + 2;
} else {
$n + +;
}
Jump out of the loop when you're over the number you want to take
if ($noc >= $length) {
Break
}
}
This place is to remove the last word for $dot.
if ($noc > $length) {
$n-= $tn;
}
$strcut = substr ($string, 0, $n);
} else {
is not utf-8 encoded in full angle and then moves back 2 bits
for ($i = 0; $i < $length; $i + +) {
$strcut. = Ord ($string [$i]) > 127? $string [$i]. $string [+ + $i]: $string [$i];
}
}
And then restore the original htmlspecialchars.
$strcut = Str_replace (Array ($pre. ' & '. $end, $pre. '"' . $end, $pre. ' < '. $end, $pre. ' > '. $end), Array (' & ', ' "", ' < ', ' > '), $strcut);
$pos = Strrpos ($strcut, Chr (1));
if ($pos!== false) {
$strcut = substr ($strcut, 0, $pos);
}
Return $strcut. $dot; Finally, the interception plus $dot output
}

The biggest flaw with the Discuz version is the use of strlen to get the length of the original string and to compare it to the incoming length parameter (number of bytes), because the number of bytes in the UTF-8 is not fixed, So the dilemma is: how much length of interception should you specify if you want to intercept 4 Chinese characters? 8-byte or 12-byte? This is unpredictable, and precisely because the problem discuz cutstr is actually a bug, the following test results can be seen:
Copy Code code as follows:

$str 1 = "Want to be poor thousand Eyes";
Echo My_cutstr ($str 1, 10, "..."). " \ n "; Output: To the poor thousand eyes ... [This is a bug, think about what causes it? ]
Echo My_cutstr ($str 1, 15, "..."). " \ n "; Output: Want to be poor thousand eyes

The reason for this bug is that when you intercept characters with the CUTSTR function, you count a Chinese character by 2 characters, so the 5 Chinese characters are 10, and the original string is 15 bytes long, so Cutstr thinks "successfully" intercepts 10 characters from a string of 15 characters, then adds " Tail. " To resolve this bug, just determine whether the returned substring is the same as the original string, and if it is the same, do not add a "tail".
Ecshop Edition
Copy Code code as follows:

/**
* [Ecshop] based on PHP mb_substr,iconv_substr these two extensions to intercept the string, Chinese characters are based on 1 character length calculation;
* This function is only applicable to UTF-8 encoded Chinese strings.
*
* @param $str Original string
* @param number of characters $length intercepted
* @param $append Replace the end string of the truncated part
* @return returns the intercepted string
*/
function Sub_str ($str, $length = 0, $append = ' ... ') {
$str = Trim ($STR);
$strlength = strlen ($STR);
if ($length = = 0 | | $length >= $strlength) {
return $str;
} elseif ($length < 0) {
$length = $strlength + $length;
if ($length < 0) {
$length = $strlength;
}
}
if (function_exists (' mb_substr ')) {
$newstr = mb_substr ($str, 0, $length, ' utf-8 ');
} elseif (Function_exists (' iconv_substr ')) {
$newstr = iconv_substr ($str, 0, $length, ' utf-8 ');
} else {
$newstr = Trim_right (substr ($str, 0, $length));
$newstr = substr ($str, 0, $length);
}
if ($append && $str!= $newstr) {
$newstr. = $append;
}
return $newstr;
}

The character and disadvantage of the ecshop version is that the Chinese characters are counted as one character, if the original string does not contain Chinese, such as: abcd1234, if the intention is to intercept 4 Chinese characters or 8 English characters, then use Ecshop version will not get the desired result, the return value is: ABCD. Here are the simple test results:
Copy Code code as follows:

$str 1 = "Day by mountain, the Yellow River into the current";
echo $str 1. " \ n ";
Echo My_sub_str ($str 1, 4, "..."). " \ n "; Output: Day by mountain ...
$str 2 = "White 1st 2 according to 3 Mountains 4";
echo $str 2. " \ n ";
Echo My_sub_str ($str 2, 4, "..."). " \ n "; Output: White 1st 2 ...

optimized version
Most of the scenarios for intercepting Chinese strings are "the original string can be Chinese, English, mixed numbers, 2 characters in English, and 1 characters for the number," according to this requirement, an implementation version is given below:
Copy Code code as follows:

/**
* String interception, Chinese characters are calculated in 2 character, while supporting GBK and UTF-8 encoding
* @param $string the string to intercept
* @param $length The number of characters to intercept
* @param $append the tail added to the substring
* @return returns the intercepted string
*/
function substring ($string, $length, $append = False) {
if ($length <= 0) {
Return ";
}
Detects if the original string is UTF-8 encoded
$is _utf8 = false;
$str 1 = @iconv ("UTF-8", "GBK", $string);
$str 2 = @iconv ("GBK", "UTF-8", $str 1);
if ($string = = $str 2) {
$is _utf8 = true;
If the UTF-8 encoding is used, the GBK encoded
$string = $str 1;
}
$newstr = ';
for ($i = 0; $i < $length; $i + +) {
$newstr. = Ord ($string [$i]) > 127? $string [$i]. $string [+ + $i]: $string [$i];
}
if ($is _utf8) {
$newstr = @iconv ("GBK", "UTF-8", $newstr);
}
if ($append && $newstr!= $string) {
$newstr. = $append;
}
return $newstr;
}

The test results are shown below (consistent with the results of GBK and UTF-8):
Copy Code code as follows:

$str 1 = "Day by mountain, the Yellow River into the current";
echo substring ($str 1, 4, "..."). " \ n "; Output: Daytime ...
echo substring ($str 1, 5, "..."). " \ n "; Output: Daylight ...
$str 2 = "12 white 34 days 56 according to 78 mountains";
echo substring ($str 2, 4, "..."). " \ n "; Output: 12 white ...
echo substring ($str 2, 5, "..."). " \ n "; Output: 12 White 3 ...

Author: edwardlost ' blog

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.