[Python crawler] Three methods for processing js files,
Recently I was writing a school wifi connection login Applet and encountered the problem that the password submitted in the form was encrypted by the js file. So google learned the following three methods.
1. Converting JavaScript to python means translating js into python.
2. Use selenium + phantomjs to simulate manual operations.
3. Use pyexecjs to directly execute js files
The following is an explanation!
Wi-Fi logon interface form data
DDDDD is the user name and upass is the password, and other parameters are unaffected.
The analysis found that the upass password was encrypted by A. js file. Find it!
At first glance, it was quite complicated. At first I couldn't find a proper method. I tried to use the first method to translate js into python, but I did not understand various MD5 encoding methods, I think this method can only be one-to-one, and can only solve the encryption problem of this webpage, so I gave up looking for other methods. (PS: But I still see many people using this method on the Internet)
Try the second method, selenium + phantomjs
From selenium import webdriverfrom selenium. webdriver. common. keys import Keysdriver = webdriver. phantomJS (executable_path = r 'C: \ Python27 \ phantomjs-2.1.1-windows \ bin \ phantomjs.exe ') # add r in front of the path in windows !!! # Driver = webdriver. chrome () # You can also use chrome firefox and other browsers to implement driver. get ("http: // 202.113.112.30/0.htm") elem = driver. find_element_by_name ("DDDDD") elem. send_keys ("xxxxxx") elem = driver. find_element_by_name ("upass") elem. send_keys ("xxxxxx") elem = driver. find_element_by_id ("submit "). click ()
This method achieves the same effect as manual login. It simulates the input of the account password and click the login button, but the speed is slow, so you can explore the third method.
JiaMiPasswd = execjs. compile (open (r "a41.js"). read (). decode ("UTF-8"). call ('bingo', passwd)
This command is followed by the position of the js file to be executed. The first single quotation mark after the call results in a function of the js file to be executed, function bingo (passwd ){...}, the first comma is followed by the parameter that is included in the function, which is the function in the js file:
Overall code
# Coding: utf-8import execjsimport urllibimport urllib2class NetIn (object): def _ init _ (self): self. loginUrl = "http: // 202.113.112.30/0.htm" self. headers = {'user-agent': 'mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/100 ',} self. values = {'ddddd ': "", 'upass': "", 'r1': "0", 'r2': "1", 'para ': "00", '0mkkey': "123456", 'v6ip': ""} def jiaMiPasswd (self): print "enter your password" passwd = raw_input () jiaMiPasswd = execjs. compile (open (r "a41.js "). read (). decode ("UTF-8 ")). call ('bingo', passwd) # a41.js has been moved to the current directory. return jiaMiPasswdif _ name _ = "_ main _": netIn = NetIn () print "enter your account:" uname = raw_input () netIn. values ['ddddd '] = uname netIn. values ['upass'] = netIn. jiaMiPasswd () postdata = urllib. urlencode (netIn. values) request = urllib2.Request (netIn. loginUrl, postdata, netIn. headers) response = urllib2.urlopen (request) print response. read (). decode ('gbk') # print the page information to check whether the logon is successful.
The third method is much faster than the second method, and can solve the problem of using different js files on different websites.
That's all.