1, first to the beginning of the description of the encoding method is: Utf-8
Example:-*-coding:utf-8-*-
2, if you encounter a string, immediately converted to Unicode, do not use STR (), directly using Unicode ()
UNICODE_STR = Unicode (' Chinese ', encoding= ' utf-8 ')
Print Unicode_str.encode (' Utf-8 ')
3, if it is for the file operation, open the file, it is best to use Codecs.open, instead of open
Import Codecs
Codecs.open (' filename ', encoding= ' UTF8 ')
4. Unicode converted to Str,str by encode encoding to Unicode via decode decoding
For a chestnut: a utf-8 format string, first decoded to Unicode, this time cannot be directly output, encode encoding can be output
When encoding a string, it decodes itself to Unicode with the default encoding, and then encodes the Unicode to the encoding you specify
Some individuals understand that the beginning of the computer only support ASCII code, because it was invented by Americans, so only support 127 characters; later, in order to unify, the Unicode code, also known as the Universal Code, but the Unicode encoding of English will be more than the ASCII code storage space, So there's a UTF-8 encoding that converts Unicode encoding into variable-length encoding; in computer memory, Unicode encoding is used uniformly, and is converted to UTF-8 encoding when it needs to be saved to the hard disk or when it needs to be transferred.
Python's string type is str, in memory in Unicode, a character corresponding to a number of bytes, if you want to transfer on the network or save to disk, you need to turn str into bytes.
In Python, data of type bytes is represented by single or double quotation marks prefixed with B. The STR represented in Unicode can be encoded as a specified bytes by using the Encode () method.
Conversely, if a byte stream is read from a network or disk, the data read is bytes, and the decode () method is used to turn bytes into str.
"Coding problems in Python"