Reprint: http://blog.csdn.net/chummyhe89/article/details/7777613
Accounted for 2 bytes: 0
3 bytes: Basically equivalent to GBK, containing more than 21,000 Chinese characters
4 bytes: Chinese, Japanese and Korean characters inside the large character set, there are more than 50,000
A utf8 number takes 1 bytes
A UTF8 English alphabet account of 1 bytes
When looking for UTF-8 encoded data, found that many of the posts said UTF-8 encoding, a Chinese character occupies 3 bytes, and some have done a proof, presumably, to create a non-BOM UTF-8 encoded text file, which saved a few Chinese characters, and then view the size of the file. I don't think this proves to be a bit convincing, because UTF-8 is a long, 1-6 byte, and a small amount of kanji testing doesn't mean that all Chinese characters are true.
Later I looked at the character mapping table-Chinese, found the correct answer, the minority is the Chinese character each occupies 3 bytes, the majority occupies 4 bytes.
Takes up 3 bytes of range
[text]View Plaincopy
- U+2e80-u+2ef3:0xe2 0xBA 0x80-0xe2 0xBB 0xb3 a total of 115
- U+2f00-u+2fd5:0xe2 0xBC 0x80-0xe2 0xBF 0x95 a total of 213
- U+3005-u+3029:0xe3 0x80 0x85-0xe3 0x80 0xa9 a total of 36
- U+3038-u+4db5:0xe3 0x80 0xb8-0xe4 0xb6 0xb5 a total of 7,549
- U+4e00-u+fa6a:0xe4 0xb8 0x80-0xef 0xa9 0xAA a total of 44,138
- U+fa70-u+fad9:0xef 0xa9 0xb0-0xef 0xAB 0x99 a total of 105
Total: 52156
Takes up 4 bytes of range
[text]View Plaincopy
- u+20000-u+2fa1d:0xf0 0xA0 0x80 0x80-0xf0 0xAF 0xa8 0x9d a total of 64,029
Total: 64029
How many bytes the Chinese characters in the Utf-8 occupy