Python Reptile Blog about login

Source: Internet
Author: User

There are two common ways to sign in:

    1. View login page, Csrf,cookie, authorization, cookie
    2. Send a POST request directly to get a cookie

The above is just a simple description, the following is a detailed approach to the two types of log on when the crawler processing methods

First case

This example is actually more, now a lot of Web site login is the first method, here by using GitHub as an example:

Analysis page

Get Authenticity_token Information

We all know the login page here is a form form submission, which I can analyze with Google Chrome

If we find this token message,
So we should first access this login page by code to get this authenticity_token information before login

Get cookie information on landing page

Set-cookie here is the cookie for the login page

Analyze the login package to get the submission address

When we enter a username and password and click Submit, we can find the address from the package, which is the POST request to submit the form information
Requested address: Https://github.com/session
The requested parameters are:
"Commit": "Sign In",
"UTF8": "?",
"Authenticity_token": "km6q0mm9fti95wysi/wu3bnambyrmv60c0ytqlzjbuauya193lp2gd8btcmqbsfvpfzrlk3/1tfonoggudy7ig== ”,
"Login": "[email protected]",
"Password": "123"

From here we can also see the "Authenticity_token" in the submission parameters, and this parameter is required from the landing page to get first.
When we log in successfully:

Visit GitHub again, and this time the cookie adds two cookie information, which is the information added after login
So if we want to log in through the program, we need to get cookie information again after the login is successful.
Then use this cookie to access other information on our github such as our Personal Information Settings page:
Https://github.com/settings/profile

Code implementation

The following code implements the login and access https://github.com/settings/repositories

ImportRequests fromBs4ImportBeautifulsoupbase_url="Https://github.com/login"Login_url="https://github.com/session"defget_github_html (URL):" "This is used to get the HTML for the login page, as well as the cookie:p Aram Url:https://github.com/login:return: The HTML for the login page, and the first Cooke" "Response=requests.get (URL) first_cookie=response.cookies.get_dict ()returnResponse.text,first_cookiedefGet_token (HTML):" "HTML:p Aram HTML:: return: Get Csrftoken for the post-login page" "Soup= BeautifulSoup (HTML,'lxml') Res= Soup.find ("input", attrs={"name":"Authenticity_token"}) Token= res["value"]    returntokendefGihub_login (Url,token,cookie):" "This is a cookie used to log in:p Aram url:https://github.com/session:p Aram Token:csrftoken:p Aram Cookie: The first time you log in: Return: Returns the Cooke after the first and second merges" "Data= {        "Commit":" Sign In",        "UTF8":"?",        "Authenticity_token": Token,"Login":"your GitHub account",        "Password":"ru10150417521"} response= Requests.post (url,data=data,cookies=cookies)Print(response.status_code) Cookies=response.cookies.get_dict ()#The explanatory note here is because GitHub was previously merging two of times by a cookie.    #not now, but you can get it straight.    #cookie.update (Second_cookie)    returnCookiesif __name__=='__main__': Html,cookie=get_github_html (base_url) token=Get_token (HTML) cookie=Gihub_login (Login_url,token,cookie) Response= Requests.get ("https://github.com/settings/repositories", cookies=cookies)Print(Response.text)
The second case

Here through the Bole online as an example, this is relatively simple compared to the first, there is not too much analysis process directly send a POST request, and then get a cookie, through the cookie to access other pages, the following is a code implementation example:
http://www.jobbole.com/bookmark/ This address is a page that can only be accessed after login, otherwise it will return directly to the login page

Here's the point:http://www.jobbole.com/wp-admin/admin-ajax.php is the request address of the login which can be seen in the clutch.

ImportRequestsdeflogin (): URL="http://www.jobbole.com/wp-admin/admin-ajax.php"Data= {        "Action":"User_login",        "User_login":"zhaofan1015",        "User_pass":'******',} response=requests.post (url,data) Cookie=response.cookies.get_dict ()Print(cookie) url2="http://www.jobbole.com/bookmark/"Response2= Requests.get (url2,cookies=cookies)Print(response2.text) login ()

Python Reptile Blog about login

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.