Take a look at Liaoche's Python2.7 tutorial in the afternoon, see the string and Encode section, have a little feeling, combine Cia Qingcai's Python blog to record this feeling:
ASCII: is a byte (8bit, 0-255) of 127 letters for uppercase and lowercase letters, numbers and some symbols. It is mainly used to denote modern English and Western European languages.
So there is a problem in dealing with Chinese, because Chinese processing requires at least two bytes, so China has developed a GB2312.
As a result, countries have developed national standards. Japan has developed Shift_JIS
, and South Korea has developedEuc-kr。。。那么,乱码就来了。
In order to unify, Unicode was born. The unified code unifies all languages into a set of encodings. It solves the problem of garbled characters, but the problem of inefficient storage and transmission comes again.
Because the ASCII encoding is 1 bytes, the Unicode encoding is usually 2 bytes. You indicate that an English letter is sufficient for one byte, but Unicode has to be represented by two bytes (another byte is 0).
In order to save, the encoding of converting Unicode encoding to "Variable length encoding" appeared UTF-8
. The UTF-8 encoding encodes a Unicode character into 1-6 bytes according to a different number size, the commonly used English letter is encoded in 1 bytes, the kanji is usually 3 bytes, and only the very uncommon characters are encoded into 4-6 bytes. If the text you want to transfer contains a large number of English characters, you can save space with UTF-8 encoding (ASCII code can be seen as part of the UTF-8, so a lot of legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding).
Now if I were to edit a python script with Notepad, I opened the file in memory and opened up a space to temporarily store my saved code, in computer memory, uniformly using Unicode encoding.
So I write the Chinese string, to precede the plus u represents a Unicode encoded string.
Also in the static blog:
But why sometimes, we need to use decode (' Utf-8 '), and then combined with a quiet blog to see:
The code for the response of the server sent to the client (ie, the browser) by the embarrassing encyclopedia is ' UTF-8 ':
In order for text editing (reading text), Unicode encoding is required in memory, so decoding with decode (' Utf-8 ') translates UTF-8 to Unicode encoding (encode (' utf-8 ') converts Unicode to UTF-8 encoding, in the same vein).
When saving the text to the hard disk or when it needs to be transferred, it is converted to UTF-8 encoding, so we need to define it at the beginning of the Python script #-*-coding:utf-8-*-
Photo source
Liaoche's official website: https://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/ 001386819196283586a37629844456ca7e5a7faa9b94ee8000
Cia Qingcai's personal blog: http://cuiqingcai.com/990.html
Unicode encoding and UTF-8 encoding in Python