In the "Learning php&mysql-Character Coding chapter (i)" introduced the conversion relationship between Unicode and UTF-8, summed up a UTF-8 coding rules, according to the Code rules, write a UTF-8 code parser, the following is the implementation of PHP:
Copy CodeThe code is as follows:
/*
program function, $str is a mixed UTF-8 encoded string in English and Chinese,
The string is correctly decoded and displayed according to the encoding rules of the UTF-8.
*/
$str = ' Today very happy, all decided to go to KFC to eat cola chicken wings!!! ';
/*
$str is the string to intercept
$len is the number of characters intercepted
*/
function Utf8sub ($STR, $len) {
if ($len <= 0) {
Return ';
}
$offset = 0; The offset at which high-level bytes are intercepted
$chars = 0; The number of characters to intercept
$res = "; Storing the resulting string for interception
while ($chars < $len) {
Takes the first byte of a string first
Convert it to decimal
and then into binary
$high = Ord (substr ($str, $offset, 1));
echo ' $high = '. $high. '
';
if ($high = = null) {//If the fetch high is NULL, the proof has been taken to the end and the direct break
Break
}
if ($high >>2) = = = 0x3F) {//moves the high position to the right 2 bits, and the binary 111111 compares, the same takes 6 bytes
Interception of 2 bytes
$count = 6;
}else if (($high >>3) = = = 0x1F) {//moves the high position to the right 2 bits, compares the binary 11111, and takes 5 bytes in the same
Interception of 3 bytes
$count = 5;
}else if (($high >>4) = = = 0xF) {//moves the high position to the right 2 bits, compares the binary 1111, and takes 4 bytes in the same
Interception of 4 bytes
$count = 4;
}else if (($high >>5) = = = 0x7) {//moves the high position to the right 2 bits, compares the binary 111, and takes 3 bytes in the same
Interception of 5 bytes
$count = 3;
}else if (($high >>6) = = = 0x3) {//moves the high position to the right 2 bits, compares the binary 11, and takes 2 bytes in the same
Interception of 6 bytes
$count = 2;
}else if (($high >>7) = = = 0x0) {//moves the high position to the right 2 bits, compares the binary 0, and takes 1 bytes in the same
$count = 1;
}
echo ' $count = '. $count. '
';
$res. = substr ($str, $offset, $count); Remove a character and connect to a $res string
$chars + = 1; Number of characters intercepted +1
$offset + = $count; Intercept high offset backward $count bytes
}
return $res;
}
Echo utf8sub ($STR, 100);
http://www.bkjia.com/PHPjc/326131.html www.bkjia.com true http://www.bkjia.com/PHPjc/326131.html techarticle in the study of PHPLT; PHP/* Program features, $STR is a mixed UTF-8 encoded string in English and Chinese, the string according to UTF-8 encoding rules correctly decoded and displayed. */$str = ' Today very ...