Let's look at an example:
#-*-Coding:utf8-*-s = U ' Chinese intercept ' s.decode (' UTF8 ') [0:3].encode (' UTF8 ') # result u ' Chinese intercept
Extended reading:
How many bytes does the Chinese character in UTF-8 occupy?
Accounted for 2 bytes: 0
3 bytes: Basically equivalent to GBK, containing more than 21,000 Chinese characters
4 bytes: Chinese, Japanese and Korean characters inside the large character set, there are more than 50,000
A UTF-8 number takes 1 bytes
A UTF-8 English alphabet account of 1 bytes
When looking for UTF-8 encoded data, found that many of the posts said UTF-8 encoding, a Chinese character occupies 3 bytes, and some have done a proof, presumably, to create a non-BOM UTF-8 encoded text file, which saved a few Chinese characters, and then view the size of the file. I don't think this proves to be a bit convincing, because UTF-8 is a long, 1-6 byte, and a small amount of kanji testing doesn't mean that all Chinese characters are true.
Later I looked at the character mapping table-Chinese, found the correct answer, the minority is the Chinese character each occupies 3 bytes, the majority occupies 4 bytes.
The above mentioned is the whole content of this article, I hope you can like.