Basic Knowledge
ASCII encoding is 1 byte bytes, while Unicode encoding is usually 2 bytes
1bytes=8bit
In computer memory, Unicode encoding is used uniformly, and is converted to UTF-8 encoding when it needs to be saved to the hard disk or when it needs to be transferred.
The letter "A" is ASCII encoded with 65 decimal, binary 01000001;
The character "0" is encoded with ASCII as decimal 48, binary 00110000, note that the character ' 0 ' and the integer 0 are different;
The Chinese character "medium" has exceeded the ASCII encoding range, Unicode encoding is decimal 20013, binary 01001110 00101101, 1 bytes of ASCII encoding has not been satisfied.
It can be guessed that if ASCII-encoded a is encoded in Unicode, only 0 is required in front, so the Unicode encoding of A is 00000000 01000001,
The difference: One is 1 bytes and one is 2 bytes.
Imagine that if your text is all in English, using Unicode encoding than ASCII encoding requires more storage space, storage and transmission is very cost-effective, because Unicode is 2 bytes, a byte can represent all the English letter words.
Therefore, in the spirit of saving, there has been the conversion of Unicode encoding to "Variable length encoding" UTF-8 encoding. The UTF-8 encoding encodes a Unicode character into 1-6 bytes according to a different number size, the commonly used English letter is encoded in 1 bytes, the kanji is usually 3 bytes, and only the very uncommon characters are encoded into 4-6 bytes . If the text you want to transfer contains a large number of English characters, you can save space with UTF-8 encoding:
character |
ASCII |
Unicode |
UTF-8 |
A |
01000001 |
00000000 01000001 |
01000001 |
In |
No |
01001110 00101101 |
11100100 10111000 10101101 |
The table also shows that the UTF-8 encoding has an added benefit that ASCII encoding can actually be seen as part of the UTF-8 encoding, so a large number of legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding.
Related connections: 1
Related connections: 2
Python Coding issues