Python Simulation Landing Practice

Source: Internet
Author: User
Tags xpath

For some URLs that need to be logged in to crawl data, you need to log in to access it. This article describes how to use Python to perform a mock landing

Preparation tools:

1. Build a Django framework to simulate landing

2.fiddler Grab Bag tool, Chrome browser

3.pycharm Editor

Steps:

1. Start the Django service, here is not much description, direct Baidu, you can find a lot of answers (remember to create a superuser, so that the subsequent landing)

Enter http://127.0.0.1:8000/admin/This is the back-end of Django, and when you log in, Django comes with a csrf cross-site Scripting attack defense system, which goes into the browser's debug mode and finds the value under the CSRF tag.

    

Django has the effect of preventing cross-site attacks by changing this value.

  

2. Log in with the previously created Superuser account 123 zxc123456 and use Fiddler to grab the packet

This is the parameter you need to submit your form.

#Coding=utf-8ImportRequests fromlxmlImportetree# Request header can also be copied directly from the Fiddler, in accordance with the format of the dictionary headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',           'user-agent':'mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/49.0.2623.112 safari/537.36',           'accept-encoding':'gzip, deflate',           'Accept-language':'zh-cn,zh;q=0.8',           }#set up a session to connect different requests from the same user; Cookies are processed automatically until the end of the sessionSession =requests. Session ()defget_xsrf ():"""Get Parameters"""Response= Session.get ('Http://127.0.0.1:8000/admin', headers=headers) HTML=Response.text Selector=etree. HTML (HTML)
# Here I get value by XPath, or through regular expressions _xsrf= Selector.xpath ('//*[@id = "Login-form"]/input/@value') Print_xsrf,htmlreturn_XSRFdeflogin (): # URL crawled via fiddler URL URL for login='http://127.0.0.1:8000/admin/login/?next=/admin/'Data= {'Csrfmiddlewaretoken': Get_xsrf (),'username': 123, 'Password':'zxc123456',} # POST request with form result= Session.post (URL, data=data, headers=headers) # After a successful login, you can test by requesting the address you need to log in.
# result2 = session.get (' url ', headers=heders)
# Print Result2.text
PrintResult.textif __name__=='__main__': Login ()

Log in successfully, then you can crawl the data you want.

Python Simulation Landing Practice

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.