This article mainly analyzes the python character encoding file methods in depth and has some reference value. interested partners can refer to the development of character encoding.
ASCII 255 only supports English letters and numbers, with special characters 1 bytes
Unicode: 2 bytes in Chinese and English
UTF-8 Chinese> 3 bytes English> 1 bytes
Bytes type
Text data is always represented by unicode by str type, while binary data is represented by bytes type.
Binary data is used for video and audio files, and data transmission through socket networks.
String to binary str. encode ("encoding = utf-8 ")
Convert binary to string B '\ xe2 \ x82'. decode ("encoding = utf-8 ")
The file handle is the memory address of the file object.
Character encoding and transcoding
The ASCII code table cannot contain Chinese characters. the default system character encoding for windows is GBK.
Unicode character encoding can store all the characters in the world, but all characters occupy two bytes.
Unicode files occupy 4 MB of storage space.
UTF-8 string to gbk characters
Any two encoding strings must be converted to Unicode encoding first.
Unicode (unified code, universal code, single code) is a character encoding used on a computer. Unicode is generated to address the limitations of traditional character encoding schemes. it sets a uniform and unique binary encoding for each character in each language.
The occurrence of garbled characters is basically in two situations:
1. no Character encoding
2. character encoding conflicts. When someone writes this program, the specified character set and the character set we use are not in the correct position.
In Python 2. x, when Pyton explains the. py file, it gives him an ASCII code by default.
In Python3, Unicode encoding is used by default.
Because in python2.X, the default is ASCII encoding, you specify the encoding in the file as a UTF-8, but if you want to convert the UTF-8 GBK is not directly transferred, the Unicode needs to be a transfer site.
Character string feature. Once modified, re-create
The above is a detailed description of the python character encoding file method. For more information, see other related articles in the first PHP community!