In the previous article, when we introduced the request to login to the CSDN website, we used a fixed cookie, which was to get the cookie value by grabbing the packet, and then send the cookie value in the sent packet to the server for authentication.
It's like getting the following data. and add it to the header information.
The constructed cookie value
Cookies={' Jsessionid ':' 5543aaaaaaaaaaaaaaaabbbbbb.tomcat2 ',
' Uuid_tt_dd ':' -411111111111119_20170926 ',' Jsessionid ':' 2222222222222220265c40d8a33cb.tomcat2 ',
' UN ':' XXXXX ',' UE ':' [email protected] ',' BT ':' 334343481 ',' LSSC ':' Lssc-145514-7aaaaaaaaaaazggmhfvhfo9taaaaaaar-passport.csdn.net ',
' Hm_lvt_6bcd52f51bbbbbb2bec4a3997715ac ':' 15044213,150656493,15064444445,1534488843 ',' Hm_lpvt_6bcd52f51bbbbbbbe32bec4a3997715ac ':' 1506388843 ',
' Dc_tos ':' Oabckz ',' dc_session_id ':' 15063aaaa027_0.7098840409889817 ',' __message_sys_msg_id ':' 0 ',' __message_gu_msg_id ':' 0',' __message_cnel_msg_id ':' 0 ', ' __message_district_code ' : '000000 ',' __message_in_school ':' 0 '}
However, there is a problem with this approach, that is, every time you need to get the server to send the cookie value, the degree of automation is greatly reduced. In fact, the requests library also has a function to save the cookie value and send it automatically in the subsequent message interaction. We're going to build the post data on our own.
First, look at the value of the submission at each landing. There are username, password and lt,execution,_eventid these fields.
Where do these fields get from, by looking at the CSDN Web page login data, found these fields, originally the input box element inside the attribute data
If you know the source of all the data, then construct the program code:
header={' Host ': ' passport.csdn.net ', ' user-agent ': ' mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/46.0.2490.80 safari/537.36 ',
"Accept-language": "zh-cn,zh;q=0.8",
"Accept-encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}
header1={' user-agent ': ' mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/46.0.2490.80 safari/ 537.36 ',
"Accept-language": "zh-cn,zh;q=0.8",
"Accept-encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}
Url2= ' Http://passport.csdn.net/account/login '
"To set up a session, which will save the cookie in the interaction and send it in the interactive process."
R=requests. Session ()
S2=r.get (URL2)
Html=beautifulsoup (S2.text, "Html.parser")
"' to crawl the value of lt,execution by means of BeautifulSoup."
For input in Html.find_all (' input '):
If ' name ' in Input.attrs and input.attrs[' name '] = = ' LT ':
lt=input.attrs[' value ']
If ' name ' in Input.attrs and input.attrs[' name '] = = ' execution ':
e1=input.attrs[' value ']
pay_load={' username ': ' xxxxx ', ' Password ': ' xxxxxxx ', ' lt ': lt, ' execution ': E1, ' _eventid ': ' Submit '}
S=r.post (Url2,headers=header,data=pay_load)
"Get my Blog content"
S1=r.get (' http://my.csdn.net/my/mycsdn ', headers=header1)
In this way, it avoids the need to get the cookie value for each login, and can be automatically logged in at any time. A lot more convenient than a fixed cookie value login method
Python web crawler Requests Library II