Objective
Recently, crawling the information on the watercress, helpless is always blocked, agent camouflage and cookie modification these have been used, but unfortunately do not play any role, to a certain number of times, or will return 403. Want to use proxy IP, helpless free too unstable, buy fees and a bit unnecessary. Today in check the information, read a talk about ADSL dialing agent article, just I was in this way online, and then thought of a counter to the bean paste anti-crawler approach, when the crawler detection is blocked, disconnect the router, sleep after a period of time to continue crawling.
PS: My router model is tl-wr842n
First, the idea of 1. Login Router Management System 2. The call function of the operation found by the Grab Package tool 3. Calling functions
Second, the code and the specific Operation 1. Specific code
#!/usr/bin/env python#-*-coding:utf-8-*-#used to disconnect a routed connection in order to change the IP through this methodimport requestsimport jsonimport sslimport timessl._create_default_https_context=Ssl._create_unverified_contextdata= {"Method":" Do","Login":{"Password":"your encrypted password."} # Post-login observations obtained}#fill it out according to your own circumstances .headers ={' Host ': '192.168.0.1', ' User-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/62.0.3202.89 safari/537.36' , ' Accept ': ' Application/json, Text/javascript, */*; q=0.01', ' Accept-Encoding ': ' gzip, deflate ', ' Accept-language ': ' zh-cn,zh;q=0.9,en;q=0.8,ja;q=0.7,zh-tw;q=0.6' , ' Connection ': ' Keep-alive ', ' Content-length ': ' 50', ' Content-type ': ' Application/json; Charset=utf-8' , ' Origin ': ' http:192.168.0.1' , ' Referer ': ' http:192.168.0.1/', ' X-requested-With ': ' XMLHttpRequest '}url="HTTP://192.168.0.1/"HTML= Requests.post (url,json=data,headers=headers,verify =False) print (html.headers) Stok = json.loads (Html.text) ["stok"]full_url = "http://192.168.0.1/stok=" + Stok + "/ds" Disconnect = {"Network":{"Change_wan_status":{"Proto":"PPPoE","operate":"Disconnect"}},"Method":" Do"}#through observation to obtainDisconn_route = Requests.post (Url=full_url, json=Disconnect). JSON () print (Disconn_route)
2. Get password encrypted after login
Go to your Router management page, my is HTTP://192.168.0.1/, open the browser grab interface, my is Chrome, direct F12 on the line, and then enter the password to login.
Open the Network tab, locate the first file named 192.168.0.1, and find the Request Payload section on the right, where you can find the encrypted password when you log in, actually point to view source directly and then copy the content into the code.
3. Get headers
It was also in that file that I found the request headers and then copied the contents to the past.
4. Get the appropriate function call information
For example, I want to disconnect, then go to the routing settings → Internet settings interface, through element to find the HTML element of the button, and finally find its callback function file:
Click inside the file to find the. Action action, which is the function that the disconnect needs to call.
5. Run the code and observe the output
After filling out this information, run the code and observe the results. If the call succeeds, print: {u ' Error_code ': 0}
This is the TL-WR842N router control script, different models do not necessarily apply.
Third, reference
1. You cannot restart the route with Python after using xx-net #10283
Use Python to control your router