In the computer, often encountered coding problems, this section mainly combs the ascii,unicode,utf8,gbk of the relationship between the various encodings.
Ascii
Computer, all data is represented by 0 and each. In the beginning, the content to be expressed is less, and people use ASCII encoding to encode it.
ASCII (American Standard Code for Information interchange, United States Standards Information Interchange Code) is a set of computer coding systems based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can be used up to 8 Bit to represent (one byte), that is: 2**8-1 = 255, so the ASCII code can only represent a maximum of 255 symbols.
1 1 1 1 1 1 1 1 =2**0+2**1+2**2+2**3+2**4+2**5+2**6+2**7 = 2**8-1=255
Unicode,utf-8,gbk
With the development of computers, it is obvious that ASCII code cannot represent all kinds of words and symbols in the world, so we need a new encoding that can represent all the characters and symbols, namely: Unicode
Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. Unicode is created to address the limitations of traditional character encoding schemes, which set a uniform and unique binary encoding for each character in each language to meet the requirements of cross-language, cross-platform text conversion and processing. Unicode specifies that all characters and symbols are represented with a minimum of 2 bytes (16 bits), i.e. 2**16-1=65535
UTF-8, which is compression and optimization of Unicode encoding, does not use a minimum of 2 bytes, but instead classifies all characters and symbols: the contents of the ASCII code are saved with 1 bytes, the characters in Europe are saved in 2 bytes, and the characters in East Asia are saved in 3 bytes ...
GBK, also based on the further optimization of Unicode encoding, GBK's literal encoding is expressed in double-byte notation, that is, both Chinese and English characters are represented by double-byte
The relationship between Unicode and UTF-8,GBK,
Python environment
In Python2, when the Python interpreter loads the code in the. py file, the content is encoded (default Ascill)
Therefore, if you have Chinese in the file, the ASCII code will not be represented. Therefore, in a. py file, you should explicitly tell the Python interpreter what code to use to execute the source code, namely:
#!/usr/bin/env python#-*-coding:utf-8-*-print "Hello, World"
In Python3, the Python interpreter, which encodes the content by default in Unicode, does not need to specify an encoding format to represent Chinese.
In Python, the relationship between ASCII,UNICODE,UTF8,GBK is combed