Original Address:The method of realizing Chinese character string interception without garbled characters
Author:Showan UTF-8 Chinese intercept function
In PHP, substr () function intercepts with Chinese strings, it may be garbled, this is because the Chinese and western languages a byte accounted for a different number of bytes, and substr length parameter is calculated according to bytes, in GB2312 encoding, a Chinese accounted for 2 bytes, English is 1 bytes, In UTF-8 encoding, a Chinese may occupy 2 or 3 bytes, and English or half-width punctuation is 1 bytes.
Workaround
UTF-8 encoded characters may consist of 1-3 bytes, and the exact number can be determined by the first byte.
The first byte is greater than 224, and it is composed of a UTF-8 character with 2 bytes after it
The first byte is greater than 192, which is less than 224, and it has a UTF-8 character with the 1 bytes after it, otherwise the first byte is itself an English character (including numbers and a small number of punctuation marks).
<?php
$a = "I am a programmer";
Class Dx
{
Private $str;
Public Function Msubstr ($string, $start, $length)
{
if (strlen ($string) > $length)
{
$n = 0;
$str = "";
$len = $start + $length;
for ($i = $start; $i < $len; $i + +)
{
if (Ord (substr ($string, $n, 1)) >224)
{
$str. =substr ($string, $n, 3);
$n +=3;
$i + +;
}
ElseIf (Ord (substr ($string, $n, 1)) >192)
{
$str. =substr ($string, $n, 2);
$i + +;
}
Else
{
$str. =substr ($string, $n, 1);
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.