Python crawler cheats browser for login by replacing HTTP request header

Source: Internet
Author: User

Take the watercress as an example, visit https://www.douban.com/contacts/list to see who you care about, and log in to view it.

If you use the Requests.get () method to get this HTTP, no login can only catch a login interface, so we have to use Python to log into the site to crawl the desired page.

An easy way to do this is to log in on your browser and then use the method (chrome as an example) to find your own cookie and user-agent, Then use Python to send the request with this copy of the header to replace the sent request has reached the purpose of login, the server will assume that you are logged in the user.

The code is as follows:

Importrequestsheaders= {    'user-agent':'mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) applewebkit/537.36 (khtml, like Gecko) chrome/63.0.3239.108 safari/537.36',    'Cookies':'gr_user_id=1f9ea7ea-462a-4a6f-9d55-156631fc6d45, Bid=vpypmmd30-k, ll= "118282"; ue= "Codin; __utmz= 30149280.1499577720.27.14.utmcsr=douban.com|utmccn= (referral) |utmcmd=referral|utmcct=/doulist/240962/; __utmv=30149280.3049; _VWO_UUID_V2=F04099A9DD; Viewed= "27607246_26356432"; Ap=1; Ps=y; push_noty_num=0; push_doumail_num=0; Dbcl2= "30496987:gzxpftzw4y0"; Ck=13ey; _pk_ref.100001.8cb4=%5b%22%22%2c%22%22%2c1515153574%2c%22https%3a%2f%2fbook.douban.com%2fmine%22%5d; __utma=30149280.833870293.1473539740.1514800523.1515153574.50; __utmc=30149280; _pk_id.100001.8cb4=255d8377ad92c57e.1473520329.20.1515153606.1514628010.'}r= Requests.get ('https://www.douban.com/contacts/list', headers =headers)Print(R.text)

Python crawler cheats browser for login by replacing HTTP request header

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.