The GBK simplified character set is encoded in 1 and 2 bytes at the same time. When the high position is 0x00 ~ 0x7f is a byte, and the value above 0x80 is expressed in two bytes"
Note: The brackets are in hexadecimal format.
When you find that the content of a byte is greater than 0x7f, it must be a Chinese character (pieced together with another byte). How can you determine whether it must be greater than 0x7f?
The number next to 0x7f (1111111) is 0x80 (10000000), so if you want to exceed 0x7f, the maximum bit of this Byte must be 1, we only need to determine whether the highest bit is 1.
Judgment Method:
Bits and (the same bits are 1 and only 1; otherwise, 0 ):
For example, to determine whether the third digit of a number is 1, as long as it is connected to 4 (100) bits, to determine whether the third digit of a number is 1 is 2 (10) bits and 2 (10) bits.
Similarly, judge whether the eighth digit is 1 as long as it matches (10000000), that is, 0x80 and so on.
Why not> 0x7f? php may be okay, but in other strong languages, the maximum bit of one byte is used to indicate a negative number. A negative number cannot be greater than 0x7f (the largest integer)
Another example:
The assic code of a is 97 (1100001)
The assic code of A is 65 (1000001)
B's assic code is 98 (1100010)
B's assic code is 66 (1000010)
We found a rule: a lowercase letter, the sixth character must be 1. We can use this to determine the case:
At this time, we only need to use a letter with 0x20 (100000) to position and judge:
Copy codeThe Code is as follows:
If (ord ($ a) & 0x20 ){
// Uppercase
}
How can I change all letters to uppercase letters? Change 1 of the sixth digit to 0:
Copy codeThe Code is as follows:
$ A = 'a ';
$ A = chr (ord ($ )&(~ 0x20 ));
Echo $;