Of 2 bytes:
3 bytes: basically equivalent to GBK, containing more than 21000 Chinese Characters
Four bytes: Chinese characters in the Chinese-Japanese character set, with more than 50 thousand characters
One utf8 digit occupies 1 byte
One utf8 English letter occupies 1 byte
In the search for UTF-8 encoding data found that many posts said UTF-8 encoding, a Chinese character occupies 3 bytes, some also made a proof, probably like this, create a text file without BOM UTF-8 encoding, which saves several Chinese characters and then views the file size. I think this proof is not convincing, because the UTF-8 is extended, 1-6 bytes, a small number of Chinese Character detection is not that all Chinese characters are.
Later, I checked the character ing table-"Chinese" and found the correct answer. A few Chinese characters occupy three bytes, most of which occupy four bytes.
The range of 3 bytes occupied.
U + 2e80-U + 2ef3: 0xe2 0xba 0x80-0xe2 0xbb 0xb3 115 U + 2f00-U + 2fd5: 0xe2 0xbc 0x80-0xe2 0xbf 0x95 213 U + 3005-U + 3029: 0xe3 0x80 0x85-0xe3 0x80 0xa9 36 U + 3038-U + 4db5: 0xe3 0x80 0xb8-0xe4 0xb6 0xb5 7549 U + 4e00-U + fa6a: 0xe4 0xb8 0x80-0xef 0xa9 0xaa 44138 U + fa70-U + fad9: 0xef 0xa9 0xb0-0xef 0xab 0x99 105
Total:52156Items
The range of 4 bytes occupied
U + 20000-U + 2fa1d: 0xf0 0xa0 0x80 0x80-0xf0 0xaf 0xa8 0x9d a total of 64029
Total:64029Items