Python encoding and python encoding conversion
1. Default System Encoding
The default encoding for python 2.7 is ascii.
The default encoding for python 3.1 is unicode.
You can use the built-in module sys to obtain the default system encoding.
Import sys
Print sys. getdefaultencoding ()
There are two ways to change the default encoding to UTF-8.
1. Add #-*-coding: UTF-8-*-to the first line of the Code -*-
2. Use the sys module
Import sys
Reload (sys)
Sys. setdefaultencoding ("UTF-8 ")
Ii. decode and encode Encoding
Several common codes
- GB2312 encoding: Suitable for information exchange between Chinese Character Processing, Chinese Character communication, and other systems
- GBK encoding: is one of the Chinese character encoding standards, is based on the GB2312-80 standards of the internal code extension specification, the use of double byte encoding
- ASCII encoding: it is a unified provision on the relationship between English characters and binary characters.
- Unicode encoding: This is the encoding of All characters in the world. Of course, there is no prescribed storage method.
- UTF-8 encoding: stands for Unicode Transformation Format-8 bit, and UTF-8 is an implementation of Unicode. It is a variable length encoding method, which can be 1 ~ The four bytes indicate a single character, and the length of the byte can be changed according to different symbols.
Encoding conversion
The strings in Python are generally Unicode encoded. The default encoding of strings in the Code is the same as that of the code file. Therefore, to perform some encoding and conversion, Unicode is usually used as the intermediate encoding. That is, decode the other encoded strings to Unicode, and then encode the strings from the Unicode (encode) into another encoding.
DecodeThe function is to convert other encoded strings to Unicode encoding. For example, name. decode ("GB2312") indicates to convert the string name encoded by GB2312 to Unicode encoding.
EncodeThe function is to convert Unicode encoding to other encoded strings. For example, name. encode ("GB2312") indicates to convert the name of a GB2312 encoded string to GB2312 encoding.
Therefore, you must first know the name encoding format before conversion.