Compare Discuz and Ecshop intercept string functions PHP version

Compare Discuz and Ecshop intercept string functions PHP version _php Tutorial

Last Update:2016-07-21 Source: Internet

Author: User

Tags ord truncated

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following gives the source code of the two versions of the function and the simple test, and finally I will give a more practical function of the string interception. It is important to note that the string interception problem discussed here is for UTF-8 encoded Chinese strings.
Discuz version
Copy CodeThe code is as follows:
/**
* [Discuz] based on PHP does not install MB_SUBSTR, such as extension intercept string, if the interception of the text is calculated as 2 characters
* @param $string the string to intercept
* @param $length number of characters to intercept
* @param $dot Replace the end string of the truncated section
* @return returns the truncated string
*/
function Cutstr ($string, $length, $dot = ' ... ') {
If the string is less than the length to intercept, return directly
Using strlen here to get string lengths is a big drawback, such as the string "Happy New Year" to intercept 4 Chinese characters,
Then you must know the number of bytes of these 4 Chinese characters, otherwise the returned string may be "Happy New Year ..."
if (strlen ($string) <= $length) {
return $string;
}
Convert the original string to Htmlspecialchars
$pre = Chr (1);
$end = Chr (1);
$string = str_replace (Array (' & ', ' ' ', ' < ', ' > '), Array ($pre. ') & '. $end, $pre. '"' . $end, $pre. ' < '. $end, $pre. ' > '. $end), $string);
$strcut = "; Initialize return value
If it's utf-8 code (this is a bit incomplete, it could be UTF8)
if (Strtolower (CHARSET) = = ' Utf-8 ') {
The initial continuous loop pointer $n, the last word bit number $tn, the number of characters intercepted $NOC
$n = $tn = $noc = 0;
while ($n < strlen ($string)) {
$t = Ord ($string [$n]);
if ($t = = 9 | | $t = = 10 | | (<= $t && $t <= 126)) {
If the English half-width symbol, and so on, $n pointer back 1-bit, $tn the last word is 1-bit
$tn = 1;
$n + +;
$noc + +;
} elseif (194 <= $t && $t <= 223) {
If it is a two-byte character $n The pointer moves back 2 bits, $tn the last word is 2 bits
$tn = 2;
$n + = 2;
$noc + = 2;
} elseif (224 <= $t && $t <= 239) {
If it is three bytes (which can be understood as medium term), $n move back 3 bits, $tn the last word is 3 bits
$tn = 3;
$n + = 3;
$noc + = 2;
} elseif (<= $t && $t <= 247) {
$tn = 4;
$n + = 4;
$noc + = 2;
} elseif (248 <= $t && $t <= 251) {
$tn = 5;
$n + = 5;
$noc + = 2;
} elseif ($t = = 252 | | $t = = 253) {
$tn = 6;
$n + = 6;
$noc + = 2;
} else {
$n + +;
}
Jump out of a continuous loop if you exceed the number you want to take
if ($noc >= $length) {
Break
}
}
This is the place where the last word is removed for added $dot.
if ($noc > $length) {
$n-= $tn;
}
$strcut = substr ($string, 0, $n);
} else {
Not UTF-8 encoded full-width is shifted back 2 bits
for ($i = 0; $i < $length; $i + +) {
$strcut. = Ord ($string [$i]) > 127? $string [$i]. $string [+ + $i]: $string [$i];
}
}
and restore the original htmlspecialchars.
$strcut = Str_replace (Array ($pre. ' & '. $end, $pre. '"' . $end, $pre. ' < '. $end, $pre. ' > '. $end), Array (' & ', ' "', ' < ', ' > '), $strcut);
$pos = Strrpos ($strcut, Chr (1));
if ($pos!== false) {
$strcut = substr ($strcut, 0, $pos);
}
Return $strcut. $dot; Finally, the interception plus the $dot output
}

The biggest flaw in the Discuz version is the use of strlen to get the length of the original string and to compare it to the incoming length parameter (number of bytes), because the number of bytes in the UTF-8 Chinese character is not fixed, So it's a dilemma: how much intercept length should you specify if you want to intercept 4 Chinese characters? 8 bytes or 12 bytes? This is not predictable, and precisely because of this problem discuz cutstr is actually a bug, through the following test results can be seen:
Copy CodeThe code is as follows:
$str 1 = "For the purpose of the poor Trinidad";
Echo My_cutstr ($str 1, 10, "..."). " \ n "; Output: Want to poor Trinidad ... [This is a bug, think about what causes it?] ]
Echo My_cutstr ($str 1, 15, "..."). " \ n "; Output: Want to be poor Trinidad

The cause of the above bug is the CUTSTR function in the interception of characters is a Chinese character by 2 characters, then 5 Chinese characters is 10 characters, and the original string length is 15 bytes, so Cutstr think "successfully" from the 15-character string to intercept 10 characters, and then added " Tail. " To resolve this bug, just determine if the returned substring is the same as the original string, if the same is not added "tail".
version Ecshop
Copy CodeThe code is as follows:
/**
* [Ecshop] based on PHP mb_substr,iconv_substr these two extensions to intercept the string, Chinese characters are 1 characters in length calculation;
* This function is only applicable to UTF-8 encoded Chinese strings.
*
* @param $str Original string
* Number of characters intercepted @param $length
* @param $append Replace the end string of the truncated section
* @return returns the truncated string
*/
function Sub_str ($str, $length = 0, $append = ' ... ') {
$str = Trim ($STR);
$strlength = strlen ($STR);
if ($length = = 0 | | $length >= $strlength) {
return $str;
} elseif ($length < 0) {
$length = $strlength + $length;
if ($length < 0) {
$length = $strlength;
}
}
if (function_exists (' mb_substr ')) {
$newstr = mb_substr ($str, 0, $length, ' utf-8 ');
} elseif (Function_exists (' iconv_substr ')) {
$newstr = iconv_substr ($str, 0, $length, ' utf-8 ');
} else {
$newstr = Trim_right (substr ($str, 0, $length));
$newstr = substr ($str, 0, $length);
}
if ($append && $str! = $newstr) {
$newstr. = $append;
}
return $newstr;
}

Ecshop version of the features and disadvantages are to count as a character, if the original string does not contain Chinese, such as: abcd1234, if the intention is to intercept 4 Chinese characters or 8 English characters, then the use of Ecshop version will not get the desired result, the return value is: ABCD. Here are the simple test results:
Copy CodeThe code is as follows:
$str 1 = "The day depends on the mountain, the Yellow River into the ocean";
echo $str 1. " \ n ";
Echo My_sub_str ($str 1, 4, "..."). " \ n "; Output: Daytime mountain ...
$str 2 = "White 1st 2 3 Mountain 4";
echo $str 2. " \ n ";
Echo My_sub_str ($str 2, 4, "..."). " \ n "; Output: White 1st 2 ...

optimized version
The majority of the application of the Chinese string is "the original string can be Chinese, English, digital mixed, in the text in 2 characters, the English number by 1 characters," according to the requirement below gives an implementation version:
Copy CodeThe code is as follows:
/**
* String intercept, Chinese characters are calculated in 2 character, support GBK and UTF-8 encoding
* @param $string the string to intercept
* @param $length number of characters to intercept
* @param the tail $append added to the substring
* @return returns the truncated string
*/
function substring ($string, $length, $append = False) {
if ($length <= 0) {
Return ';
}
Detects if the original string is UTF-8 encoded
$is _utf8 = false;
$str 1 = @iconv ("UTF-8", "GBK", $string);
$str 2 = @iconv ("GBK", "UTF-8", $str 1);
if ($string = = $str 2) {
$is _utf8 = true;
If the UTF-8 encoding is used, the GBK encoded
$string = $str 1;
}
$newstr = ";
for ($i = 0; $i < $length; $i + +) {
$newstr. = Ord ($string [$i]) > 127? $string [$i]. $string [+ + $i]: $string [$i];
}
if ($is _utf8) {
$newstr = @iconv ("GBK", "UTF-8", $newstr);
}
if ($append && $newstr! = $string) {
$newstr. = $append;
}
return $newstr;
}

The test results are shown in the following (GBK and UTF-8 results are consistent):
Copy CodeThe code is as follows:
$str 1 = "The day depends on the mountain, the Yellow River into the ocean";
echo substring ($str 1, 4, "..."). " \ n "; Output: Daytime ...
echo substring ($str 1, 5, "..."). " \ n "; Output: Daylight ...
$str 2 = "12 white 34 days 56 according to 78 Mountain";
echo substring ($str 2, 4, "..."). " \ n "; Output: 12 white ...
echo substring ($str 2, 5, "..."). " \ n "; Output: 12 White 3 ...

Author: edwardlost ' blog

http://www.bkjia.com/PHPjc/325891.html www.bkjia.com true http://www.bkjia.com/PHPjc/325891.html techarticle The following gives the source code of the two versions of the function and the simple test, and finally I will give a more practical function of the string interception. Note that the string to be discussed here is truncated ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More