For some URLs that need to be logged in to crawl data, you need to log in to access it. This article describes how to use Python to perform a mock landing
Preparation tools:
1. Build a Django framework to simulate landing
2.fiddler Grab Bag tool, Chrome browser
3.pycharm Editor
Steps:
1. Start the Django service, here is not much description, direct Baidu, you can find a lot of answers (remember to create a superuser, so that the subsequent landing)
Enter http://127.0.0.1:8000/admin/This is the back-end of Django, and when you log in, Django comes with a csrf cross-site Scripting attack defense system, which goes into the browser's debug mode and finds the value under the CSRF tag.
Django has the effect of preventing cross-site attacks by changing this value.
2. Log in with the previously created Superuser account 123 zxc123456 and use Fiddler to grab the packet
This is the parameter you need to submit your form.
#Coding=utf-8ImportRequests fromlxmlImportetree# Request header can also be copied directly from the Fiddler, in accordance with the format of the dictionary headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'user-agent':'mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/49.0.2623.112 safari/537.36', 'accept-encoding':'gzip, deflate', 'Accept-language':'zh-cn,zh;q=0.8', }#set up a session to connect different requests from the same user; Cookies are processed automatically until the end of the sessionSession =requests. Session ()defget_xsrf ():"""Get Parameters"""Response= Session.get ('Http://127.0.0.1:8000/admin', headers=headers) HTML=Response.text Selector=etree. HTML (HTML)
# Here I get value by XPath, or through regular expressions _xsrf= Selector.xpath ('//*[@id = "Login-form"]/input/@value') Print_xsrf,htmlreturn_XSRFdeflogin (): # URL crawled via fiddler URL URL for login='http://127.0.0.1:8000/admin/login/?next=/admin/'Data= {'Csrfmiddlewaretoken': Get_xsrf (),'username': 123, 'Password':'zxc123456',} # POST request with form result= Session.post (URL, data=data, headers=headers) # After a successful login, you can test by requesting the address you need to log in.
# result2 = session.get (' url ', headers=heders)
# Print Result2.text
PrintResult.textif __name__=='__main__': Login ()
Log in successfully, then you can crawl the data you want.
Python Simulation Landing Practice