Python--Pull hook network crawler analog login __python

Source: Internet
Author: User
Tags md5 encryption

Getting Started crawler for a while, recently doing a pull-hook network Data Crawler analysis, the project is nearing the end, so take a time to write about the project encountered some problems.

The current pull hook net of the anti-reptile mechanism is still possible, a start with the scrapy shell analysis pull hook nets, found that pull hook network to verify useragent, and then access to a few times will be redirected to the login page, that is, pull hook network will verify cookies.

Here is the idea of a mock landing:
Pull-Hook Web login page: https://passport.lagou.com/login/login.html

Grab the bag and analyze it.

can be analyzed to simulate the landing required parameters:
url = "Https://passport.lagou.com/login/login.html"

PostData = {
' Isvalidate ': ' true ',
' username ': username,
' Password ': password,
' Request_form_verifycode ': ",
' Submit ': '
}

HEADERS = {
' Referer ': ' https://passport.lagou.com/login/login.html ',
' User-agent ': ",
' X-requested-with ': ' XMLHttpRequest ',
' X-anit-forge-token ': ",
' X-anit-forge-code ', ',
}

so how to get X-anit-forge-token, x-anit-forge-code these two parameters.
We open F12 carefully look at the login page source code

You can find these two values in the head tag, just use the positive to match them.

So how does the password for the login be encrypted?
From the source can be seen, login page loaded JS is not much, then one to find it.
In Main.html_aio_f95e644.js ("Https://img.lagou.com/passport/static/pkg/pc/page/login/main.html_aio_f95e644.js")
This JS inside found the method of encryption:

First encrypt the password once MD5: password = MD5 (password)
And then add veenike this string of characters: password = "Veenike" + password + "Veenike"
Finally again MD5 encryption: password = MD5 (password)

So here's the analysis, and here's the code to simulate the login.

#!/usr/bin/env python #-*-coding:utf-8-*-Import requests import Hashlib Import re #请求对象 session = Requests.session () #请求头信息 HEADERS = {' Referer ': ' https://passport.lagou.com/login/login.html ', ' user-agent ': ' mozilla/5.0 (Macintos H Intel Mac OS X 10.12; rv:51.0) gecko/20100101 firefox/51.0 ', Def Get_password (passwd): ' The password is MD5 double encrypted veennike This value is in MAIN.HTML_AIO_F9 
    5e644.js file found ' passwd = Hashlib.md5 (Passwd.encode (' Utf-8 ')). Hexdigest () passwd = ' veenike ' + passwd + ' veenike '
    passwd = Hashlib.md5 (Passwd.encode (' Utf-8 ')). Hexdigest () return passwd def get_token (): Forge_token = "" Forge_code = "" "Login_page = ' https://passport.lagou.com/login/login.html ' data = Session.get (login_page, headers =headers) Match_obj = Re.match (R '. *x_anti_forge_token = \ ' (. *?) \';. *x_anti_forge_code = \ ' (\d+?) \ ', Data.text, re. Dotall) If Match_obj:forge_token = Match_obj.group (1) forge_code = Match_obj.group (2) return forg E_Token, Forge_code def login (username, passwd): X_anti_forge_token, X_anti_forge_code = Get_token () login_headers = Headers.copy () login_headers.update ({' X-requested-with ': ' XMLHttpRequest ', ' X-anit-forge-token ': X_Anti_Forge_  Token, ' X-anit-forge-code ': x_anti_forge_code}) postdata = {' Isvalidate ': ' true ', ' username ': Username, ' password ': Get_password (passwd), ' request_form_verifycode ': ', ' Submit ': ',} response = Session.post (' Https://passport.lagou.com/login/login.json ', Data=postdata, Headers=login_head ERS) print (response.text) def get_cookies (): Return Requests.utils.dict_from_cookiejar (session.cookies) if __nam e__ = = "__main__": username = ' 1371XXXXXXX ' passwd = ' xxxxxxxxxx ' login (username, passwd) print (Get_cookie S ())

Console results

A cookie can be obtained after a mock login to prepare the crawler. The code copy is able to run, but pull check the login check will change at any time, if the failure of the login can be found in the following comments, I will make time to update the code.

GitHub Address: Https://github.com/laichilueng/lagou_login
If you like, you can go for a compliment.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.