Description of Python encode and decode functions

Source: Internet
Author: User

String encoding Common types: UTF-8,GB2312,CP936,GBK and so on.

In Python, we use decode () and encode () to decode and encode

In Python, the Unicode type is used as the underlying type of the encoding. That

Decode encode

STR---------> Unicode--------->str

u = u‘Chinese‘#Displays the specified Unicode type Object USTR = U.encode (‘gb2312 ") # encode Unicode gb2312 encoding 
str1 = U.encode (' GBK ') # Encode Unicode pairs of images with GBK encoding
STR2 = U.encode (' Utf-8 ') # Encode Unicode images in utf-8 encoding U1 = Str.decode (gb2312") Span style= "COLOR: #008000" ># ' utf-8 ") #

As in the above code, STR\STR1\STR2 are string types (str), which brings greater complexity to string manipulation.

The good news is, yes, that's python3. In the new version of Python3, the Unicode type is removed, instead it is a string type (str) that uses Unicode characters, and the string type (str) becomes the underlying type as shown below, and the encoded change to the byte type ( bytes) But the use of two functions does not change:

Decode encode

bytes------> str (Unicode)------>bytes

' Chinese # Specifies a String type Object USTR = U.encode ('gb2312# Encoded with gb2312 encoding for u, get bytes type Object stru1 = Str.decode ( 'gb2312')# decode the string str with GB2312 encoding to get the String type Object U1U2 = Str.decode ('utf-8')# If you decode str with UTF-8 encoding, you will not be able to restore the original string contents             

What is not to be avoided is the file read problem:

If we read a file, save the file, use the encoding format, determine the content we read from the file encoding format, for example, we create a new text file from Notepad test.txt, edit the content, save the time to note that the encoding format is optional, for example, we can choose gb2312, Then use Python to read the file contents in the following way:

f = open (‘Test.txt‘,‘R‘) s = F.read ()#Reads the contents of the file, if it is an unrecognized encoding format (Recognition ofThe encoding type is associated with the system used), where the read fails‘‘‘Suppose the file is saved with gb2312 encoding‘‘‘U = S.decode (‘gb2312) # ""  below we can convert the content into various encodings  "str = U.encode ( ' utf-8< Span style= "COLOR: #800000" > ") # Convert to Utf-8 encoded string strstr1 = U.encode ( ' gbk# Convert to GBK encoded string str1str1 = U.encode (  "utf-16") # Convert to utf-16 encoded string str1       

Python provides us with a package codecs to read the file, and the open () function in this package can specify the type of encoding:

Import CODECSF = Codecs.open ('text.text','r+', encoding='utf-8') # The encoding format of the file must be known in advance, where the file encoding is used utf-8content = F.read ()# If the encoding used in open and the encoding of the file itself are inconsistent, Then there will be error f.write (' the message you want to write ') f.close ()        

Description of Python encode and decode functions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.