Python web crawler Requests Library II

Source: Internet
Author: User
Tags webp python web crawler

In the previous article, when we introduced the request to login to the CSDN website, we used a fixed cookie, which was to get the cookie value by grabbing the packet, and then send the cookie value in the sent packet to the server for authentication.

It's like getting the following data. and add it to the header information.

The constructed cookie value

Cookies={' Jsessionid ':' 5543aaaaaaaaaaaaaaaabbbbbb.tomcat2 ',
' Uuid_tt_dd ':' -411111111111119_20170926 ',' Jsessionid ':' 2222222222222220265c40d8a33cb.tomcat2 ',
' UN ':' XXXXX ',' UE ':' [email protected] ',' BT ':' 334343481 ',' LSSC ':' Lssc-145514-7aaaaaaaaaaazggmhfvhfo9taaaaaaar-passport.csdn.net ',
' Hm_lvt_6bcd52f51bbbbbb2bec4a3997715ac ':' 15044213,150656493,15064444445,1534488843 ',' Hm_lpvt_6bcd52f51bbbbbbbe32bec4a3997715ac ':' 1506388843 ',
' Dc_tos ':' Oabckz ',' dc_session_id ':' 15063aaaa027_0.7098840409889817 ',' __message_sys_msg_id ':' 0 ',' __message_gu_msg_id ':' 0',' __message_cnel_msg_id ':' 0 ', ' __message_district_code ' : '000000 ',' __message_in_school ':' 0 '}
However, there is a problem with this approach, that is, every time you need to get the server to send the cookie value, the degree of automation is greatly reduced. In fact, the requests library also has a function to save the cookie value and send it automatically in the subsequent message interaction. We're going to build the post data on our own.
First, look at the value of the submission at each landing. There are username, password and lt,execution,_eventid these fields.

Where do these fields get from, by looking at the CSDN Web page login data, found these fields, originally the input box element inside the attribute data

If you know the source of all the data, then construct the program code:

header={' Host ': ' passport.csdn.net ', ' user-agent ': ' mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/46.0.2490.80 safari/537.36 ',
"Accept-language": "zh-cn,zh;q=0.8",
"Accept-encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}
header1={' user-agent ': ' mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/46.0.2490.80 safari/ 537.36 ',
"Accept-language": "zh-cn,zh;q=0.8",
"Accept-encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}
Url2= ' Http://passport.csdn.net/account/login '

"To set up a session, which will save the cookie in the interaction and send it in the interactive process."
R=requests. Session ()
S2=r.get (URL2)
Html=beautifulsoup (S2.text, "Html.parser")
"' to crawl the value of lt,execution by means of BeautifulSoup."
For input in Html.find_all (' input '):
If ' name ' in Input.attrs and input.attrs[' name '] = = ' LT ':
lt=input.attrs[' value ']
If ' name ' in Input.attrs and input.attrs[' name '] = = ' execution ':
e1=input.attrs[' value ']
pay_load={' username ': ' xxxxx ', ' Password ': ' xxxxxxx ', ' lt ': lt, ' execution ': E1, ' _eventid ': ' Submit '}
S=r.post (Url2,headers=header,data=pay_load)
"Get my Blog content"
S1=r.get (' http://my.csdn.net/my/mycsdn ', headers=header1)

In this way, it avoids the need to get the cookie value for each login, and can be automatically logged in at any time. A lot more convenient than a fixed cookie value login method

Python web crawler Requests Library II

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.