When using Python to write crawlers, you often encounter a variety of maddening coding errors. Here are some simple ways to solve the problem of coding errors, I hope to help you .
First, open the site you want to crawl, right-click to view the source code, and see what it specifies, such as:
<http-equivcontent= "text/html; charset=gb2312">
The charset I specify here is gb2312, and I'm going to use gb2312 as an example to encode and decode.
Submit input
We often want to get input and submit requests in the form of parameters. If the direct request.get (url+input) , it is easy to get coding errors. This is the way to try the following:
1 data = {2 "key": Input.encode (' gb2312 ' , ' ignore ' )3}4 request.post (url,data=data)
Get Output
When obtaining the output, it is recommended to use the following methods:
1 res = request.get (XXX)2 html = res.content.decode ('gb2312') ,'ignore')
Write to File
When you save the acquired data to a file
1 f = open (path,'w+', encoding='gb2312') 2 f.write (XXX)
A solution to coding problems when Python writes crawlers