PHP correctly parsing UTF-8 string tricks application _php Tutorial

Source: Internet
Author: User
In the "Learning php&mysql-Character Coding chapter (i)" introduced the conversion relationship between Unicode and UTF-8, summed up a UTF-8 coding rules, according to the Code rules, write a UTF-8 code parser, the following is the implementation of PHP:
Copy CodeThe code is as follows:
/*
program function, $str is a mixed UTF-8 encoded string in English and Chinese,
The string is correctly decoded and displayed according to the encoding rules of the UTF-8.
*/


$str = ' Today very happy, all decided to go to KFC to eat cola chicken wings!!! ';

/*
$str is the string to intercept
$len is the number of characters intercepted
*/
function Utf8sub ($STR, $len) {
if ($len <= 0) {
Return ';
}

$offset = 0; The offset at which high-level bytes are intercepted
$chars = 0; The number of characters to intercept
$res = "; Storing the resulting string for interception

while ($chars < $len) {
Takes the first byte of a string first
Convert it to decimal
and then into binary
$high = Ord (substr ($str, $offset, 1));

echo ' $high = '. $high. '
';

if ($high = = null) {//If the fetch high is NULL, the proof has been taken to the end and the direct break
Break
}
if ($high >>2) = = = 0x3F) {//moves the high position to the right 2 bits, and the binary 111111 compares, the same takes 6 bytes
Interception of 2 bytes
$count = 6;
}else if (($high >>3) = = = 0x1F) {//moves the high position to the right 2 bits, compares the binary 11111, and takes 5 bytes in the same
Interception of 3 bytes
$count = 5;
}else if (($high >>4) = = = 0xF) {//moves the high position to the right 2 bits, compares the binary 1111, and takes 4 bytes in the same

Interception of 4 bytes
$count = 4;
}else if (($high >>5) = = = 0x7) {//moves the high position to the right 2 bits, compares the binary 111, and takes 3 bytes in the same

Interception of 5 bytes
$count = 3;
}else if (($high >>6) = = = 0x3) {//moves the high position to the right 2 bits, compares the binary 11, and takes 2 bytes in the same
Interception of 6 bytes
$count = 2;
}else if (($high >>7) = = = 0x0) {//moves the high position to the right 2 bits, compares the binary 0, and takes 1 bytes in the same
$count = 1;
}
echo ' $count = '. $count. '
';

$res. = substr ($str, $offset, $count); Remove a character and connect to a $res string
$chars + = 1; Number of characters intercepted +1
$offset + = $count; Intercept high offset backward $count bytes
}
return $res;
}

Echo utf8sub ($STR, 100);

http://www.bkjia.com/PHPjc/326131.html www.bkjia.com true http://www.bkjia.com/PHPjc/326131.html techarticle in the study of PHPLT; PHP/* Program features, $STR is a mixed UTF-8 encoded string in English and Chinese, the string according to UTF-8 encoding rules correctly decoded and displayed. */$str = ' Today very ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.