In order to identify double-byte characters, such as Chinese characters or Japanese Korean and so on are two bytes, a high of 1 per byte, while the general West character only one byte, seven-bit valid encoding, high to 0
and the 0x80 corresponding binary code is 1000 0000, the highest bit is one, represents the Chinese character. The encoding format is commonly referred to as 10 format. One Chinese character occupies 2 bytes, but only one character
The encoding of the GBK simplified character set is represented by 1 bytes and 2 bytes at a time. When the high is 0x00~0x7f, it is a byte, and the high 0x80 above is represented by 2 bytes.
Note: There are 2 in the brackets inside the system
When you find that a byte of content is greater than 0x7f, it must be a (pieced together with another byte) of Chinese characters, how to judge is definitely greater than 0x7f?
0x7f (1111111) The next number is 0x80 (10000000), so if you want to be larger than 0x7f, the highest bit of this byte is definitely 1, so we just need to determine if the top is 1.
Judging method:
Bit and (same bit is 1 only 1, otherwise 0):
For example: To determine whether the third digit of a number is 1, as long as the 4 (100) bit with, to determine whether the 2nd digit of a number is 1 with 2 (10) bit.
Similarly judge whether the eighth digit is 1 to be followed (10000000) is the 0x80 position.
Why not >0x7f,php may be ok here, but in other strongly typed languages, the highest bits of 1 bytes are used to mark negative numbers, and a negative number certainly cannot be greater than 0x7f (the largest integer)
Another example:
The code is as follows |
Copy Code |
The Assic code for A is 97 (1100001) The Assic code for A is 65 (1000001) b The Assic code is 98 (1100010) b The Assic code is 66 (1000010) |
Found a rule: a A-Z letter, as long as the lowercase letter, the sixth digit is certainly 1, we can use this to determine the case:
At this point, just follow the letter with 0x20 (100000) to position and judge:
The code is as follows |
Copy Code |
if (ord ($a) &0x20) { Capital } |
How do I change all letters to uppercase? The sixth digit 1 is changed to 0:
code is as follows |
copy code |
$a = ' a '; $a = Chr (ord ($a) & (~0x20)); Echo $a; |