According to this coding rule, write a UTF-8 code parsing program. below is the PHP implementation, A friend can refer to the "learning PHP & MYSQL -- character encoding (a)" introduced the conversion relationship between Unicode and UTF-8, summed up the coding rules of a UTF-8, according to this encoding rules, write a UTF-8 encoding parsing program, the following is the implementation of PHP:
The code is as follows:
/*
Program function, $ str is a UTF-8 encoded string mixed with Chinese and English,
Decodes and displays this string correctly according to the encoding rules of the UTF-8.
*/
$ Str = 'Today is very Happy. all decided to go to KFC to eat cool chicken wings !!! ';
/*
$ Str is the string to be truncated.
$ Len is the number of characters intercepted
*/
Function utf8sub ($ str, $ len ){
If ($ len <= 0 ){
Return '';
}
$ Offset = 0; // The offset when a high byte is intercepted.
$ Chars = 0; // Number of characters intercepted
$ Res = ''; // stores the intercepted result string
While ($ chars <$ len ){
// First take the first byte of the string
// Convert it to decimal
// Convert to binary
$ High = ord (substr ($ str, $ offset, 1 ));
// Echo '$ high ='. $ high .'
';
If ($ high = null) {// if the high position is null, it indicates that the result has been obtained to the end.
Break;
}
If ($ high> 2) ===0x3f) {// shifts the high position to the right two places. if it is the same as binary 111111, it takes 6 bytes.
// Intercept 2 bytes
$ Count = 6;
} Else if ($ high> 3) ===0x1f) {// shifts the high value to the right two places. if the value is the same as binary 11111, the value is 5 bytes.
// Truncate 3 bytes
$ Count = 5;
} Else if ($ high> 4) ===0xf) {// shifts the high value to the right two places, and compares it with binary 1111. if it is the same, it takes 4 bytes.
// Intercept 4 bytes
$ Count = 4;
} Else if ($ high> 5) === 0x7) {// shifts the high position to the right two places, which is compared with binary 111. if it is the same, it takes 3 bytes.
// Truncate 5 bytes
$ Count = 3;
} Else if ($ high> 6) === 0x3) {// shifts the high value to the right two places, which is equal to binary 11 and takes 2 bytes.
// Capture 6 bytes
$ Count = 2;
} Else if ($ high> 7) = 0x0) {// shifts the high value to the right by two places. if it is the same as binary 0, 1 byte is used.
$ Count = 1;
}
// Echo '$ count ='. $ count .'
';
$ Res. = substr ($ str, $ offset, $ count); // retrieves a character and connects it to the $ res string.
$ Chars + = 1; // Number of characters intercepted + 1
$ Offset + = $ count; // truncate the high offset and move it back to $ count bytes.
}
Return $ res;
}
Echo UTF-8 sub ($ str, 100 );