This garbled phenomenon is basically caused by coding, we want to go to the code we want, first po a knowledge point, Song Tian teacher in Python crawler and information extraction said: Response.encoding refers to the HTTP header to guess the response content encoding method, if there is no charset in the header, the default encoding is Iso-8859-1, In this way, some of the non-canonical server return will be garbled; response.apparent_encoding refers to the content of the response from the content encoding. The requests internal Utils also provides a function get_encodings_from_content that gets the page encoding from the return body, so that if the server returns a header that does not contain Charset, then get_encodings_from_ Content to know the correct encoding of the page. The following is the process of debugging:
ImportRequests fromRequests.exceptionsImportrequestexceptiondefget_one_page (URL):Try: Response=requests.get (URL)ifResponse.status_code = = 200: #print (Response.text) Print(response.encoding)Print(response.apparent_encoding) R=Response.textPrint(Requests.utils.get_encodings_from_content (R) [0]) a=r.encode ('iso-8859-1'). Decode (Requests.utils.get_encodings_from_content (R) [0])Print(a)Print('------------------------------------') b= R.encode ('iso-8859-1'). Decode (response.apparent_encoding)Print(b)returnNoneexceptrequestexception:returnNonedefmain (): URL='http://www.mh160.com/'get_one_page (URL)if __name__=='__main__': Main ()
Look at the picture! Look at the picture! Look at the picture!
Python3 of the Requests class crawl Chinese page garbled solution