Python crawler get and post methods and cookie functions, pythoncookie
First, determine the form submission method of the target website you want to crawl. You can see it through the developer tools. Chrome is recommended here.
Here, I use email 163 as an example.
Open the tool and click "Network". Select the website You Want To Know by Name. The request method IN headers on the right is the submission method. If the status is 200, the following header information is successfully accessed. The cookie is the session information generated after you log on. To access this webpage for the first time, you need to provide the user name and password, and then you only need to provide cookies in headers to log in.
The introduction of the requests Library provides the get and post methods.
Sample Code:
Import requestsimport ssluser_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv: 50.0) Gecko/20100101 Firefox/50.0" accept = 'text/html, application/xhtml + xml, application/xml; q = 0.9, */*; q = 0.8 'Accept _ language = 'zh-CN, zh; q = 0.8, en-US; q = 0.5, en; q = 0.3 'upgrade = '1' headers = {'user-agent': user_agent, 'access': Accept, 'access'-Language ': accept_language, 'cookier ': '.... '# enter the cookie generated after your login} r = requests. get ("http: // mail.163. Com/js6/main. jsp? Sid = OAwUtGgglzEJoANLHPggrsKKAhsyheAT & df = mail163_letter # module = welcome. welcomeModule % 7C % 7B % 7D ", headers = headers, verify = False) fp = open ("/temp/csdn.txt "," w ", encoding = 'utf-8 ') fp. write (str (r. content, 'utf-8') fp. close ()
Here I introduced the ssl library because the web certificate I accessed for the first time has expired. If we use a crawler to access such a website, the following error occurs: SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ ssl. c: 581)
In the get and post methods of requests, there is a parameter verify. If it is set to False, the certificate requirements will be disabled.