Coding problems in the Python language (cont.)

Source: Internet
Author: User

Tag: ACK This order function uses content display default Har

The two settings that are important in Python development are mentioned Above.

One is the default setting for codecs defaultencoding

Import sys>>> sys.getdefaultencoding ()
' ASCII '

The other is the code that declares the header of the Python file coding

# -*-coding:utf-8-*-

These two settings are very important in the encode and decode methods of Python's Str,unicode objects, which directly affect the Results. The following code is based on the current settings, i.e. defaultencoding is ASCII andcoding is Utf-8

STR and Unicode

STR is a normal string, which is normally decoded to Unicode. Unicode is a Unicode string that is encoded by default using unicode, and is typically decoded to the specified format.

Use the str () method to create a string.

s = str (' you '# Stores Press Utf-8 to store print repr (s)   #  Print the result as ' \xe4\xbd\xa0\xe4\xbb\xac '

Use the Unicode () method to create a Unicode string, or you can use the U flag. When creating a Unicode string, you specify the string encoding, otherwise the string is encoded by defaultencoding.

U1=u'you guys'U2= Unicode ('you guys')  PrintU1#output CorrectPrintU2#store-by-utf-8 storage, ascii-encoded, error#unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe4 in position 0:ordinal not in range (+)U3= Unicode ('you guys','Utf-8')#first press Utf-8 to encode the storage, then press Utf-8 decodePrintU3#output CorrectPrintRepr (u3)#the print result is U ' \U4F60\U4EEC 'U4= Unicode ('you guys','GBK')#first press Utf-8 to encode the storage, then press GBK decodePrintU4#The result of the printing is the raccoon incrementally slipperyPrintRepr (u4)#the print result is U ' \u6d63\u72b1\u6ed1 ' is a incrementally-slip three-character Unicode encoding

In summary, any string is stored in the storage according to the specified coding , and if encoding is not specified during decoding, the string is encoded according to Defaultencoding .

Decode () and encode ()

Str.decode () is used to decode a string into the specified format, and if no encoding is specified, it is encoded using the default ASCII encoding Method.

s = str (' you ')print s.decode ('utf-8'#  output correct print# error #unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe4 in position 0:ordinal not in range (+)

Unicode.encode () is used to encode standard Unicode into the specified format, and if no encoding is specified, the default defaultencoding is used for encoding.

 U=u "  you   " print  u.encode ( " utf-8    ") #   output is correct  print  u.encode ( " gbk    ") #   output garbled, encoding is utf-8, decoding is GBK  print  u.encode () #   error, encoding is utf-8, ASCII  is decoded #  unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: ordinal not in ran GE (+)  

Summing up, str string storage and Unicode string storage, according to coding setting encoding, Str.decode () decodes itself into the specified format, Unicode.encode () encode itself into the specified format, if not referred to specifying decoding format, The codec will be encoded using DEFAULTENCODING. If the encoding and decoding methods are inconsistent, errors or garbled characters will Occur.

Another str.encode (), Unicode.decode (), These two methods are Useless. Because STR is a string that has already been encoded, it does not need to be encoded again. Unicode itself has been decoded into unicode, without having to decode it again. But Python is so playful, and in order to maintain symmetry, the two methods are Designed.

The official documentation describes This: Str.encode (e) is the same as Unicode (str). encode (e). This is useful since code, expects Unicode strings should also work when it's passed ascii-encoded 8-bit strings (from Guido van Rossum).

This passage probably means that the Encode method was originally unicode, but if it was accidentally tuned as a str object, and the Str object was exactly ASCII encoded (ASCII is the same as unicode), it should be Successful. This is one of the uses of the Str.encode Method.

similarly, the decode of Ascii-like Unicode is the same, as it seems that almost any encoding has no change in Ascii. So the operation is tantamount to not doing it. These methods are useless and often make mistakes in actual use, such as the following two.

Str.encode (), first encodes str using the default encoding, and then decodes the object using the specified method

S.encode ("utf-8"# is equivalent to S.decode (defaultencoding). encode ("utf-8")

Unicode.decode (), First decodes the object using the default encoding, and then encodes the object in the specified way

U.decode ("utf-8"# is equivalent to U.encode (defaultencoding). decode ("utf-8")

About the Requests library

Requests library is often used when writing web crawlers, which is very convenient to Use.

The Request object returns an response object after accessing the server, which saves the returned HTTP response bytecode to the content Property. The content itself is a binary data that is not encoded, and if it is a text file, it is automatically encoded by the default way of Str.

The response object also has a property, text, which is a Unicode string object. If you do not handle direct access, garbled characters often occur. Because the response object encodes the content bytecode into Unicode through another attribute encoding, this encoding property is responses to Guess.

Official documents:

Text
Content of the response, in Unicode.

If response.encoding is None, encoding'll be guessed using Chardet.

The encoding of the response content is determined based solely in HTTP headers, following RFC 2616 to the Letter. If you can take advantage of non-http knowledge to make a better guess at the encoding, you should set r.encoding Appropri Ately before Accessing.

Responses gets encoding settings based on HTTP header files, most of which are correct, but sometimes with exceptions. So either you use the content and then Re-decode it, or set the encoding correctly.

#-*-coding:utf-8-*-ImportRequestsurl="http://xxx.xxx.xxx" #GBK format-encoded Web pageResponse =requests.get (url) response.encoding='GBK'PrintResponse.content#garbled displayPrintResponse.content.decode ('GBK')#display correctPrintResponse.text#garbled display

Finish

Coding problems in the Python language (cont.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.