Comparison of discuz and ecshop truncation string functions php _ PHP Tutorial

Source: Internet
Author: User
Compare the php version of the discuz and ecshop truncation string functions. The following describes the source code and simple test of the two versions of functions. at last, I will provide a more practical string truncation function. Note: The string truncation discussed here is the source code and simple test of the two versions of functions. at last, I will provide a more practical string truncation function. Note: The string truncation issues discussed here are all Chinese strings for UTF-8 encoding.
Discuz version

The code is as follows:


/**
* [Discuz] extensions such as mb_substr are not installed in PHP to intercept strings. if the intercept of Chinese characters, the string is calculated as 2 characters.
* @ Param $ string the string to be intercepted
* @ Param $ length the number of characters to be truncated
* @ Param $ dot replaces the end string of the truncated part.
* @ Return returns the intercepted string
*/
Function cutstr ($ string, $ length, $ dot = '...'){
// If the length of the string is smaller than the length to be intercepted, the system returns
// Strlen is used to obtain the length of a string. for example, if the string "happy new year" needs to be intercepted with four Chinese characters,
// You must know the number of bytes of these four Chinese characters. Otherwise, the returned string may be "happy new year ..."
If (strlen ($ string) <= $ length ){
Return $ string;
}
// Convert htmlspecialchars from the original string
$ Pre = chr (1 );
$ End = chr (1 );
$ String = str_replace (array ('&', '"', '<', '>'), array ($ pre. '&'. $ end, $ pre. '"'. $ end, $ pre. '<'. $ end, $ pre. '> '. $ end), $ string );
$ Strcut = ''; // initialize the return value
// If it is UTF-8 encoded (this judgment is incomplete, it may be utf8)
If (strtolower (CHARSET) = 'utf-8 '){
// The initial continuous Loop pointer $ n, the last digit $ tn, the number of characters intercepted $ noc
$ N = $ tn = $ noc = 0;
While ($ n <strlen ($ string )){
$ T = ord ($ string [$ n]);
If ($ t = 9 | $ t = 10 | (32 <= $ t & $ t <= 126 )){
// If it is an English half-width symbol, $ n shifts the pointer back to 1 bit, and $ tn is the last word to 1 bit.
$ Tn = 1;
$ N ++;
$ Noc ++;
} Elseif (194 <=$ t & $ t <= 223 ){
// If it is a two-byte character $ n, the pointer is moved 2 bits later, and the last word of $ tn is 2 bits
$ Tn = 2;
$ N + = 2;
$ Noc + = 2;
} Elseif (224 <=$ t & $ t <= 239 ){
// If it is a three-byte term (which can be understood as a medium term), $ n is followed by three digits, and $ tn is followed by three digits.
$ Tn = 3;
$ N + = 3;
$ Noc + = 2;
} Elseif (240 <=$ t & $ t <= 247 ){
$ Tn = 4;
$ N + = 4;
$ Noc + = 2;
} Elseif (248 <=$ t & $ t <= 251 ){
$ Tn = 5;
$ N + = 5;
$ Noc + = 2;
} Elseif ($ t = 252 | $ t = 253 ){
$ Tn = 6;
$ N + = 6;
$ Noc + = 2;
} Else {
$ N ++;
}
// When the number is exceeded, it jumps out of a continuous loop.
If ($ noc >=$ length ){
Break;
}
}
// This part removes the last word for $ dot
If ($ noc> $ length ){
$ N-= $ tn;
}
$ Strcut = substr ($ string, 0, $ n );
} Else {
// If the full angle of UTF-8 encoding is not used, the system will move 2 bits behind.
For ($ I = 0; $ I <$ length; $ I ++ ){
$ Strcut. = ord ($ string [$ I]) & gt; 127? $ String [$ I]. $ string [++ $ I]: $ string [$ I];
}
}
// Restore the original htmlspecialchars
$ Strcut = str_replace (array ($ pre. '&'. $ end, $ pre. '"'. $ end, $ pre. '<'. $ end, $ pre. '> '. $ end), array ('&', '"', '<', '>'), $ strcut );
$ Pos = strrpos ($ strcut, chr (1 ));
If ($ pos! = False ){
$ Strcut = substr ($ strcut, 0, $ pos );
}
Return $ strcut. $ dot; // add the $ dot output to the screenshot.
}


The biggest drawback of the discuz version is that it uses strlen to obtain the length of the original string and compares it with the input length parameter (number of bytes, because the number of Chinese characters in the UTF-8 is not fixed, so it will face the dilemma: if you want to intercept 4 Chinese characters should specify the length of the truncation? 8 or 12 bytes ?... This is unpredictable. it is precisely because of this problem that the cutstr of discuz actually has a bug. the following test results show that:

The code is as follows:


$ Str1 = "thousands of miles away ";
Echo my_cutstr ($ str1, 10 ,"... "). "\ n"; // output: thousands of miles away... [This is a bug. why?]
Echo my_cutstr ($ str1, 15, "..."). "\ n"; // output: thousands of miles away


The cause of the above bug is that when the cutstr function intercepts characters, it calculates a Chinese character as two characters, so the five Chinese characters are 10 characters, the length of the original string is 15 bytes. Therefore, cutstr considers that "success" takes 10 characters from the 15-character string and adds "tail ". To solve this bug, you only need to determine whether the returned substring is the same as the original substring. if the returned substring is the same, no "tail" is added ".
Ecshop edition

The code is as follows:


/**
* [Ecshop] the PHP-based mb_substr and iconv_substr extensions are used to intercept strings. Chinese characters are all counted as 1 character length;
* This function is only applicable to UTF-8 encoded Chinese strings.
*
* @ Param $ str original string
* @ Param $ length: number of characters intercepted
* @ Param $ append replaces the end string of the truncated part.
* @ Return returns the intercepted string
*/
Function sub_str ($ str, $ length = 0, $ append = '...'){
$ Str = trim ($ str );
$ Strlength = strlen ($ str );
If ($ length = 0 | $ length >=$ strlength ){
Return $ str;
} Elseif ($ length <0 ){
$ Length = $ strlength + $ length;
If ($ length <0 ){
$ Length = $ strlength;
}
}
If (function_exists ('MB _ substr ')){
$ Newstr = mb_substr ($ str, 0, $ length, 'utf-8 ');
} Elseif (function_exists ('iconv _ substr ')){
$ Newstr = iconv_substr ($ str, 0, $ length, 'utf-8 ');
} Else {
// $ Newstr = trim_right (substr ($ str, 0, $ length ));
$ Newstr = substr ($ str, 0, $ length );
}
If ($ append & $ str! = $ Newstr ){
$ Newstr. = $ append;
}
Return $ newstr;
}


The features and disadvantages of ecshop are that the Chinese character is counted as one character. if the original string does not contain Chinese characters, such as abcd1234, if you want to intercept 4 Chinese characters or 8 English characters, the ecshop version will not get the expected result. the returned value is abcd. The following is a simple test result:

The code is as follows:


$ Str1 = "The Yellow River enters the current when the sun passes through the mountains in the daytime ";
Echo $ str1. "\ n ";
Echo my_sub_str ($ str1, 4, "..."). "\ n"; // output: Baishan Yishan...
$ Str2 = "Day 1, Day 2, Mountain 3, 4 ";
Echo $ str2. "\ n ";
Echo my_sub_str ($ str2, 4, "..."). "\ n"; // output: White 1, 2...


Optimized version
Most of the application scenarios for intercepting Chinese strings are: "The original strings can be Chinese, English, and numbers. the Chinese characters are counted as 2 characters, and the English numbers are counted as 1 character ", the following is an implementation version for this requirement:

The code is as follows:


/**
* String truncation. Chinese characters are calculated as 2 characters, and both GBK and UTF-8 encoding are supported.
* @ Param $ string the string to be intercepted
* @ Param $ length the number of characters to be truncated
* @ Param $ append the tail after the substring is added
* @ Return returns the intercepted string
*/
Function substring ($ string, $ length, $ append = false ){
If ($ length <= 0 ){
Return '';
}
// Check whether the original string is UTF-8 encoded
$ Is_utf8 = false;
$ Str1 = @ iconv ("UTF-8", "GBK", $ string );
$ Str2 = @ iconv ("GBK", "UTF-8", $ str1 );
If ($ string = $ str2 ){
$ Is_utf8 = true;
// For UTF-8 encoding, use GBK-encoded
$ String = $ str1;
}
$ Newstr = '';
For ($ I = 0; $ I <$ length; $ I ++ ){
$ Newstr. = ord ($ string [$ I]) & gt; 127? $ String [$ I]. $ string [++ $ I]: $ string [$ I];
}
If ($ is_utf8 ){
$ Newstr = @ iconv ("GBK", "UTF-8", $ newstr );
}
If ($ append & $ newstr! = $ String ){
$ Newstr. = $ append;
}
Return $ newstr;
}


The test results are shown below (the results of GBK and UTF-8 are consistent ):

The code is as follows:


$ Str1 = "The Yellow River enters the current when the sun passes through the mountains in the daytime ";
Echo substring ($ str1, 4, "..."). "\ n"; // output: White Day...
Echo substring ($ str1, 5, "..."). "\ n"; // output: Dayu...
$ Str2 = "12 Bai 34 56 Yi 78 Shan ";
Echo substring ($ str2, 4, "..."). "\ n"; // output: 12 white...
Echo substring ($ str2, 5, "..."). "\ n"; // output: 12 white 3...


Author: edwardlost 'blog

Bytes. Note: The string truncation discussed here...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.