Related keywords: Requests library Requests.post method Cookies Login
First, the purpose of analysis
Use cookies to log in to the watercress and write a diary
https://www.douban.com/note/636142594/
Second, step analysis
1. Use the browser to log in the watercress, to obtain and analyze cookies
2, using a cookie simulation login watercress (use the account password login can also, need to verify the code, the time limit of the cookie is usually a few days)
3. Analyzing the browser journaling behavior, simulating post behavior in Python
4. Source code and testing
Three, scrapy shell simulation landing
1, use the browser to log in the watercress, in the fidder to obtain a cookie
There are many items in the cookie (not all required), and after a test, it is found that you can log in as long as you include ' Dbcl2 '
2. Open scrapy Shell Test login
Simulate browser user-agent and cookies
$ scrapy Shell ... fromscrapy Import requestcookies= {'Dbcl2':'"164753551:kjyotngwwii"'}headers={'user-agent':'mozilla/5.0'}req= Request ('https://www.douban.com/mine/', headers=headers,cookies =cookies) Fetch (req) #使用浏览器检查元素得到xpath (method reference Crawler (i) (ii)) (diary content permissions are not visible, if you can see the diary content to simulate the successful landing)>>> Response.xpath ('//*[@id = "Note_636142594_short"]'). Extract () ['<div class= "note" id= "Note_636142594_short" >hello douban</div>']>>> Response.xpath ('//*[@id = "Note_636142594_short"]/text ()'). Extract () ['Hello Douban']>>>
Get diary content, visible simulation login successful, Cookie available
Iv. Python writes watercress Diary
1. Use the browser to write a diary and observe the behavior in the Fidder
Discover that the browser has performed post https://www.douban.com/note/create http/1.1 behavior
The content of the post is Ck=bsjh¬e_id=636142544¬e_title=test_2¬e_text=hello2&author_tags=¬e_ Privacy=p
CK=BSJH is a value in a cookie
note_id=636142544 (estimated user ID, copy directly)
Note_id=636142544¬e_title=test_2¬e_text=hello2 (title, and content)
The other three parameters are not important, use the default on the line
2. Using Python to simulate post behavior
#post the required parameters
Requests.post (url = url,data = Data,headers=headers,verify=false,cookies = cookies)
Five, source code and testing
Source
1 Import Requests2###1, first login to any page, get cookies3 4 #使用requests打开https时会产生warming, plus this block5 requests.packages.urllib3.disable_warnings ()6 7headers =dict ()8headers['user-agent'] ='mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/53.0.2785.104 safari/537.36 core/1.53.3387.400 qqbrowser/ 9.6.11984.400'9 Tencookies =dict () Onecookies = {#'ll':'"118201"', A#'Bid':'Puwfxi53mha', -#'_ga':'GA1.2.1759080547.1501749204', -#'__yadk_uid':'Rjmlgzyjjuhi5lhnhjx3logbaltgb5xy', the#'gr_user_id':'16c2c492-9e32-4af2-9c35-230e8d43db06', -#'PS':'y', -#'_PK_REF.100001.8CB4':'%5b%22%22%2c%22%22%2c1504529257%2c%22https%3a%2f%2faccounts.douban.com%2flogin%3fredir%3dhttps%253a%252f% 252fwww.baidu.com%252flink%253furl%253deh3ngsbwz6s0p2oqc7qhrezckdwjewbljfnbprtrwkv4qwolsccwkcsh9iqfedax%2526wd %253D%2526EQID%253D8191D1C1000627560000000359AD43F4%22%5D', -#'AP':'1', +#'_VWO_UUID_V2':'57D26B154CE7E363177CFD5F35F06F34|E63FA1BFE4C07598B6454AE2A97166CB', - 'Dbcl2':'"164753551:kjyotngwwii"' +#'ck':'Osar', A#'_PK_ID.100001.8CB4':'70e88acbc88cb16d.1501749196.11.1504530290.1504527380.', at#'_PK_SES.100001.8CB4':'*', -#'Push_noty_num':'0', -#'Push_doumail_num':'0', -#'__utma':'30149280.1759080547.1501749204.1504529257.1504530054.20', -#'__UTMB':'30149280.5.10.1504530054', -#'__UTMC':'30149280', in#'__UTMZ':'30149280.1504530054.20.16.UTMCSR', -#'__UTMV':'30149280.16475' to } + -data = {'ck':'BSJH', the 'note_id':'636142544', * 'Note_title':'Hellopython', $ 'Note_text':'Hellopython'Panax Notoginseng#'Author_tags':"', -#'note_privacy':'P' the } +URL ='https://www.douban.com/note/create' A#注意访问https链接时要加上verify =false parameter, otherwise the return is wrong theret = requests.post (url =URL, +data =data, -headers=headers, $verify=False, $cookies =Cookies - ) -Print (ret.text[: -]) thePrint (Ret.cookies.get_dict ())
View Code
Test results
Done!
V. Summary and Analysis
1, this time using cookies to avoid the verification code trouble, next time hope to study the crack of verification code
2, the use of cookies is limited, a period of time will be replaced
3, requests restrictions on HTTPS is very strict, need to join Verify=false, and to block the warning message
#使用requests打开https时会产生warming, plus this block
Requests.packages.urllib3.disable_warnings ()
Python crawler personal record (iv) Use Python to write a diary on the watercress