PHP Chinese string Truncation no garbled solution _php tips

Source: Internet
Author: User
Tags chr ord strlen

A handy string intercept function:

function substring ($str, $start, $length) {//Compare easy to use string intercept function
  $len = $length;
  if ($length < 0) {
  $str = Strrev ($STR);
  $len =-$length;
  }
  $len = ($len < strlen ($STR))? $len: strlen ($STR);
  $tmpstr = "";
  for ($i = $start; $i < $len; $i + +)
  {
      if (Ord (substr ($str, $i, 1)) > 0xa0)
      {
       $tmpstr. = substr ($st R, $i, 2);
       $i + +;
      } else {
       $tmpstr. = substr ($str, $i, 1);
      }
  }
  if ($length < 0) $tmpstr = Strrev ($TMPSTR);
  return $tmpstr;
}

Examples of using methods:

$str 1 = ' I am a long string of Chinese without English ';
$str 2 = ' I am a long string of Chinese with Yingwen ';


$len = strlen ($str 1);
echo ' <br/> '. $len; return

$len = strlen ($str 2);
echo ' <br/> '. $len; Return

echo ' <br/> ';  
echo substring ($str 1, 0, one);  
echo ' <br/> ';
echo substring ($str 2, 0, one);    
echo ' <br/> ';
echo substring ($str 1);  
echo ' <br/> ';
echo substring ($str 2, 16, 29);  

The results show:

28
29
I'm a bunch of comparisons
I'm a bunch of comparisons
Chinese not with English
Chinese with Yingwen

This function is useful, for example, to truncate a longer file name, but to add ... in the middle, you can do this:

function FormatName ($STR, $size) {
  $len = strlen ($STR);
  if (strlen ($STR) > $size) {
    $part 1 = substring ($str, 0, $size/2);
    $part 2 = substring ($str, $len-($size/2), $len);
    Return $part 1. "..." . $part 2;
  } else {return
    $str;
  }
}

In addition, the Internet to see a super simple Chinese truncation solution, a trial, the effect is good:

Echo substr ($str 1,0,10). chr (0);

Principle Explanation:

Chr (0) is not NULL
07null is nothing, and the value of Chr (0) is 0. 16 is 0x00, which means binary is 00000000
08 Although Chr (0) will not show anything, but he is a character.
09 When the Chinese characters are truncated, according to the coding rules he always vlasov the other characters in the back to be interpreted as Chinese characters, this is the reason for the garbled. Values of 0x81 to 0xFF and 0x00 are always displayed as "null"
10 According to this feature, in the substr after the result of a CHR (0), you can prevent the occurrence of garbled

----------------------------

20120705 Update:

The above method is good, but occasionally still will encounter garbled, reason not to delve into. However, you can use the following methods to UTF8 character text.
Note: The method is to calculate the Chinese characters to 1 unit length, the English letter 1 unit length, so the truncation requires attention to the length set.
Method of calculating Length:

function Strlen_utf8 ($str)
{
  $len = strlen ($STR);
  $n = 0;
  for ($i = 0; $i < $len; $i + +) {
    $x = substr ($str, $i, 1);
    $a = Base_convert (ord ($x), 2);
    $a = substr (' 00000000 '. $a, -8);
    if (substr ($a, 0, 1) = = 0) {
    }elseif (substr ($a, 0, 3) = =) {
      $i + = 1;
    } ElseIf (substr ($a, 0, 4) = = 1110) {
      $i + = 2;
    }
    $n + +;
  }
  return $n;
} End Strlen_utf8;

String truncation function:

 function Substring_utf8 ($str, $start, $lenth) {$len = strlen ($STR);
    $r = Array ();
    $n = 0;
    $m = 0;
      for ($i = 0; $i < $len; $i + +) {$x = substr ($str, $i, 1);
      $a = Base_convert (ord ($x), 10, 2);
      $a = substr (' 00000000 '. $a,-8);
        if ($n < $start) {if (substr ($a, 0, 1) = = 0) {}elseif (substr ($a, 0, 3) = =) {$i = 1;
        }elseif (substr ($a, 0, 4) = = 1110) {$i + = 2;
      } $n + +;
        }else{if (substr ($a, 0, 1) = = 0) {$r [] = substr ($str, $i, 1);
          }elseif ($a, 0, 3) = (substr) {$r [] = substr ($str, $i, 2);
        $i + 1;
          }elseif (substr ($a, 0, 4) = = 1110) {$r [] = substr ($str, $i, 3);
        $i + 2;
        }else{$r [] = ';
        } if (+ + $m >= $lenth) {break;
  }} return join ($R);
}//End Substring_utf8; 

Use the same method as described earlier, such as FormatName can be implemented as follows (this is a small optimization of the length of Chinese characters):

function FormatName ($STR, $size) {
 $len = Strlen_utf8 ($STR);
 $one _len = strlen ($STR);
 $size = $size * 1.5 * $len/($one _len);
 if (Strlen_utf8 ($STR) > $size) {
  $part 1 = Substring_utf8 ($str, 0, $size/2);
  $part 2 = Substring_utf8 ($str, $len-($size/2), $len);
  Return $part 1. "..." . $part 2;
 } else {return
  $str;
 }
}

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.