String encoding Common types: UTF-8,GB2312,CP936,GBK and so on.
In Python, we use decode () and encode () to decode and encode
In Python, the Unicode type is used as the underlying type of the encoding. That
Decode encode
STR---------> Unicode--------->str
u = u‘Chinese‘#Displays the specified Unicode type Object USTR = U.encode (‘gb2312 ") # encode Unicode gb2312 encoding
str1 = U.encode (' GBK ') # Encode Unicode pairs of images with GBK encoding
STR2 = U.encode (' Utf-8 ') # Encode Unicode images in utf-8 encoding U1 = Str.decode (gb2312") Span style= "COLOR: #008000" ># ' utf-8 ") #
As in the above code, STR\STR1\STR2 are string types (str), which brings greater complexity to string manipulation.
The good news is, yes, that's python3. In the new version of Python3, the Unicode type is removed, instead it is a string type (str) that uses Unicode characters, and the string type (str) becomes the underlying type as shown below, and the encoded change to the byte type ( bytes) But the use of two functions does not change:
Decode encode
bytes------> str (Unicode)------>bytes
' Chinese # Specifies a String type Object USTR = U.encode ('gb2312# Encoded with gb2312 encoding for u, get bytes type Object stru1 = Str.decode ( 'gb2312')# decode the string str with GB2312 encoding to get the String type Object U1U2 = Str.decode ('utf-8')# If you decode str with UTF-8 encoding, you will not be able to restore the original string contents
What is not to be avoided is the file read problem:
If we read a file, save the file, use the encoding format, determine the content we read from the file encoding format, for example, we create a new text file from Notepad test.txt, edit the content, save the time to note that the encoding format is optional, for example, we can choose gb2312, Then use Python to read the file contents in the following way:
f = open (‘Test.txt‘,‘R‘) s = F.read ()#Reads the contents of the file, if it is an unrecognized encoding format (Recognition ofThe encoding type is associated with the system used), where the read fails‘‘‘Suppose the file is saved with gb2312 encoding‘‘‘U = S.decode (‘gb2312) # "" below we can convert the content into various encodings "str = U.encode ( ' utf-8< Span style= "COLOR: #800000" > ") # Convert to Utf-8 encoded string strstr1 = U.encode ( ' gbk# Convert to GBK encoded string str1str1 = U.encode ( "utf-16") # Convert to utf-16 encoded string str1
Python provides us with a package codecs to read the file, and the open () function in this package can specify the type of encoding:
Import CODECSF = Codecs.open ('text.text','r+', encoding='utf-8') # The encoding format of the file must be known in advance, where the file encoding is used utf-8content = F.read ()# If the encoding used in open and the encoding of the file itself are inconsistent, Then there will be error f.write (' the message you want to write ') f.close ()
Description of Python encode and decode functions