The encoding of the GBK simplified character set is represented by 1 bytes and 2 bytes at a time. When the high is 0x00~0x7f, it is a byte, and the high 0x80 above is represented by 2 bytes.
Note: There are 2 in the brackets inside the system
When you find that a byte of content is greater than 0x7f, it must be a (pieced together with another byte) of Chinese characters, how to judge is definitely greater than 0x7f?
0x7f (1111111) The next number is 0x80 (10000000), so if you want to be larger than 0x7f, the highest bit of this byte is definitely 1, so we just need to determine if the top is 1.
Judging Method:
Bit and (same bit is 1 only 1, otherwise 0):
For example: To determine whether the third digit of a number is 1, as long as the 4 (100) bit with, to determine whether the 2nd digit of a number is 1 with 2 (10) bit.
Similarly judge whether the eighth digit is 1 to be followed (10000000) is the 0x80 position.
Why not >0x7f,php may be ok here, but in other strongly typed languages, the highest bits of 1 bytes are used to mark negative numbers, and a negative number certainly cannot be greater than 0x7f (the largest integer)
Another example:
The Assic code for A is 97 (1100001)
The Assic code for A is 65 (1000001)
b The Assic code is 98 (1100010)
b The Assic code is 66 (1000010)
Found a rule: a A-Z letter, as long as the lowercase letter, the sixth digit is certainly 1, we can use this to determine the case:
At this point, just follow the letter with 0x20 (100000) to position and judge:
Copy Code code as follows:
if (ord ($a) &0x20) {
Capital
}
How do I change all letters to uppercase? The sixth digit 1 is changed to 0:
Copy Code code as follows:
$a = ' a ';
$a = Chr (ord ($a) & (~0x20));
echo $a;