Introduction to the analysis of the target website Login mode
Destination Address: Https://github.com/login
Login method to make analysis:
First, the form form forms the way to submit information,
Second, there are Csrf_token,
Third, a cookie that requires the first GET request when a user name and password are sent as a POST request
Finally, after the login is successful, the request for other pages is only required with the first successful login to return the cookie.
Get the tokens and cookies we want with a request sent by get
Code:
Import requests from BS4 import BEAUTIFULSOUPR1 = Requests.get (' https://github.com/login ') soup = BeautifulSoup (R1.text, features= ' lxml ') #生成soup object S1 = soup.find (name= ' input ', attrs={' name ': ' Authenticity_token '}). Get (' value ') # Find out what we want. Tokenr1_cookies = R1.cookies.get_dict () # The next time the user name is submitted cookie# print (r1_cookies) # print (S1) #结果:: {' logged_in ': ' No ' , ' _gh_sess ': ' Vdfwa2hjwjfmb1hpruflrdvhumc3mxg1tk02tdhsunhdmerungpyt2y4stlqz2xcv1lczefhk21wdfr1bkpgyuv0wejzcdeydwfzcm93
Avc4nk91q2jicmtrv0niq0lrswm4afhrsvfybctcczbwdnhvn0yysvjjnufpqnhytznurkjwndjzuwxucek2m2jkm3vsmddxvhnoy1htqkthckjqzdjyuvr2r Zbnuku3vnltrvf2u
M1admu3c3yzsglyvnvzvm0ycna1euhet1jrvwnln0psbndkwjljmgttng5urwj1eu8rqjzxnemxvethcgvobdfby2gvc2zzwxcvwwzab29wqwjyu0l6cmzscw Hbqulzyta3dtrtb
3l1s0hdyythy2v1suhewlzvvlzoswzptzbjnmlidff2dzi2bwgtltjon1lqbm5jwutsymtivem1cljpake9pq%3d% 3d--897dbc36c123940c8eae5d86f276dead8318fd6c '} prz0wapebu5shksgcesn0fijwou9alw8epusxlqgcw1ezirl0vbskvktyqie8vhxhph2h/uzgav6xx+yjtgova==
To get these two values, proceed to the next send login request:
second Post method to submit user name password
Code::
This code goes on the GET request above, just the part of the POST request, r2 = Requests.post ( ' https://github.com/session ', data ={ ' commit ': ' Sign in ', ' utf8 ': '? ', ' authenticity_token ': S1, ' login ': ' [email protected] ', ' password ': ' username password ' # Fill in the correct username }, cookies = R1.cookies.get_dict (), # The first cookie is required here) print (R2.cookies.get_dict ()) # This is a cookie after success.
after success, return to the login page information.
View personal details page Based on successful post login.
Only a cookie with a successful login will be required here.
#完整代码import requestsfrom bs4 Import BeautifulSoupr1 = Requests.get (' https://github.com/login ') soup = BeautifulSoup ( r1.text,features= ' lxml ') S1 = soup.find (name= ' input ', attrs={' name ': ' Authenticity_token '}). Get (' value ') R1_cookies = R1.cookies.get_dict () print (r1_cookies) print (s1) r2 = requests.post ( ' https://github.com/session ', data ={ ' commit ': ' Sign in ', ' utf8 ': '? ', ' authenticity_token ': S1, ' login ': ' [email protected] ', ' Password ': ' Password ' }, cookies = R1.cookies.get_dict (),) view personal details page print (r2.cookies.get_dict ()) R3 = Requests.get ( ' https://github.com/13131052183/product ', #查看个人的详情页 cookies = r2.cookies.get_dict ()) Print ( R3.text)
Python crawler script Login to GitHub and view information