Sesame HTTP: JavaScript encryption logic analysis and Python simulation to achieve data crawling, javascriptpython

Source: Internet
Author: User
Tags install node

Sesame HTTP: JavaScript encryption logic analysis and Python simulation to achieve data crawling, javascriptpython

This section describes how to analyze the JavaScript encryption logic and use Python to simulate the process of JavaScript data crawling. This section uses the online air quality monitoring and analysis platform of China as an example to analyze its encryption logic and cracking methods, pyExecJS is used to simulate and execute JavaScript to crawl data on the website.

Anti-obfuscation

After JavaScript obfuscation, there are actually anti-obfuscation methods. The simplest method is to search for online anti-obfuscation websites. Here we provide one: http://www.bm8.com.cn/jsconfusion/, we copy the JavaScript code after the obfuscation of the second line of eval in the jquery-1.8.0.min.js, And Then paste it into this website for anti-obfuscation, you can see the normal JavaScript code, after searching, you can find the getServerData () method. We can see that this method actually sends an Ajax request requesting the interface we just analyzed:

Here we can find another key method, namely getParam (), which accepts the parameters of method and object, then the returned param result is used as the POST Data parameter request interface, so param is the encrypted POST Data. Some encryption logic is in the getParam () method, and its implementation is as follows:

var getParam = (function () {        function ObjectSort(obj) {            var newObject = {};            Object.keys(obj).sort().map(function (key) {                newObject[key] = obj[key]            });            return newObject        }        return function (method, obj) {            var appId = '1a45f75b824b2dc628d5955356b5ef18';            var clienttype = 'WEB';            var timestamp = new Date().getTime();            var param = {                appId: appId,                method: method,                timestamp: timestamp,                clienttype: clienttype,                object: obj,                secret: hex_md5(appId + method + timestamp + clienttype + JSON.stringify(ObjectSort(obj)))            };            param = BASE64.encrypt(JSON.stringify(param));            return AES.encrypt(param, aes_client_key, aes_client_iv)        }    })();

We can see that Base64 and AES encryption are used here. The encrypted string is sent to the server as POST Data, then the server decrypts the string, then performs logical processing, and then encrypts the processed Data, if the encrypted data is returned, the JavaScript will be decrypted again after receiving it, And then rendered to get the normal result.

Therefore, we also need to analyze how the data returned by the server is decrypted. The decodeData () method is easy to find. Its definition is as follows:

function decodeData(data) {        data = AES.decrypt(data, aes_server_key, aes_server_iv);        data = DES.decrypt(data, des_key, des_iv);        data = BASE64.decrypt(data);        return data    }

Well, it passes through three layers of decryption to parse the normal plaintext data.

So everything is clear. We need to implement two processes to use this interface normally, that is, to implement the encryption process of POST Data and the decryption process of Response Data. The encryption process of POST Data is Base64 + AES encryption, while that of Response Data is AES + DES + Base64 decryption. The keys for encryption and decryption can also be found in JavaScript files. We can use Python to implement these encryption and decryption processes.

So what should we do next? Next!

Then, blame me!

Why bother to use Python to rewrite JavaScript? In case there is inconsistent data format or the computing result deviation caused by language incompatibility between the two, where should I go to Debug?

What should we do? Here we can use the PyExecJS library to implement JavaScript simulation.

PyExecJS

PyExecJS is a library that can use Python to simulate JavaScript running. You may have heard of PyV8, which is also a library used to simulate JavaScript Execution. However, this project is no longer maintained and does not support Python3 well. In addition, there are various installation problems, so here we use the PyExecJS library to replace it.

First, install the Library:

pip install PyExecJS

Use pip to install the SDK.

Before using this library, make sure that one of the following JS runtime environments is installed on your machine:

  • JScript
  • JavaScriptCore
  • Nashorn
  • Node
  • PhantomJS
  • PyV8
  • SlimerJS
  • SpiderMonkey

The PyExecJS library calls these engines based on their priorities for JavaScript Execution. We recommend that you install Node. js or PhantomJS here.

Then run the code to check the running environment:

import execjsprint(execjs.get().name)

After running Node. js, Node. js is used as the rendering engine. The result is as follows:

Node.js (V8)

Next, we will save the Obfuscated JavaScript as a file called encryption. js, and then use PyExecJS to simulate the relevant running methods.

First, let's implement the encryption process. Here, the getServerData () method has actually helped us implement and implemented Ajax requests, but this method contains the method for obtaining Storage, Node. js is not applicable, so here we directly rewrite it to implement a getEncryptedData () method to implement encryption, in encryption. js implements the following methods:

function getEncryptedData(method, city, type, startTime, endTime) {    var param = {};    param.city = city;    param.type = type;    param.startTime = startTime;    param.endTime = endTime;    return getParam(method, param);}

Then we can simulate and execute these methods:

Import execjs # Init environmentnode = execjs. get () # Paramsmethod = 'getcityweather 'city = 'beijing' type = 'hour 'start _ time = '2017-01-25 00:00:00 'end _ time = '2017-01-25 23:00:00' # Compile javascriptfile = 'encryption. js 'ctx = node. compile (open (file ). read () # Get paramsjs = 'getencrypteddata ("{0}", "{1}", "{2}", "{3 }", "{4 }")'. format (method, city, type, start_time, end_time) params = ctx. eval (js)

Here we first define some parameters, such as method, city, and start_time, which can be easily obtained by analyzing JavaScript rules.

Then, we first declare a runtime environment through the get () method of execjs (that is, PyExecJS), and then call the compile () method to execute the saved encryption library encryption. js. Because some encryption and custom methods are included in this section, it can be called only once it is executed.

Then we construct a js string, pass these parameters, and simulate the execution using the eval () method. The obtained result is assigned to params, which is the encrypted Data of POST Data.

Then we can directly use the requests library to simulate the POST request, and there is no need to use Ajax that comes with jQuery. Of course, the latter is also feasible, but we only need to load the jQuery library.

Then we use the requests library to simulate POST requests:

# Get encrypted response textapi = 'https://www.aqistudy.cn/apinew/aqistudyapi.php'response = requests.post(api, data={'d': params})

In this way, the response content is the encrypted content returned by the server.

Next, we can call the decodeData () method in JavaScript to implement decryption:

# Decode datajs = 'decodeData("{0}")'.format(response.text)decrypted_data = ctx.eval(js)

In this way, decrypted_data is the decrypted string. After decryption, it is actually a JSON string:

{'Success': True, 'errorcode': 0, 'errormsg ': 'success', 'result': {'success': True, 'data ': {'Total': 22, 'rows ': [{'time': '2017-01-25 00:00:00', 'temp ':'-7', 'humi ': '35', 'wse': '1', 'wd ': 'northeast Wind', 'tq': 'clear'}, {'time ': '2017-01-25 01:00:00 ', 'temp': '-9', 'humi': '38', 'wse': '1', 'wd ': 'west', 'qa': 'clear'}, {'time': '2017-01-25 02:00:00 ', 'temp': '-10', 'humi ': '40', 'wse': '1', 'wd ': 'northeast Wind', 'tq': 'clear'}, {'time ': '2017-01-25 03:00:00 ', 'temp': '-8', 'humi': '27', 'wse': '2', 'wd ': 'northeast Wind', 'tq': 'clear'}, {'time': '2017-01-25 04:00:00 ', 'temp': '-8', 'humi ': '26', 'wse': '2', 'wd ': 'dongfeng', 'tq': 'Qing'}, {'time ': '2017-01-25 05:00:00 ', 'temp': '-8', 'humi': '23', 'wse': '2', 'wd ': 'northeast Wind', 'tq': 'clear'}, {'time': '2017-01-25 06:00:00 ', 'temp': '-9', 'humi ': '27', 'wse': '2', 'wd ': 'northeast Wind', 'tq': 'Cloudy'}, {'time ': '2017-01-25 07:00:00 ', 'temp': '-9', 'humi': '24', 'wse': '2', 'wd ': 'northeast Wind', 'tq': 'Cloudy '}, {'time': '2017-01-25 08:00:00', 'temp ':'-9', 'humi ': '25', 'wse': '2', 'wd ': 'dongfeng', 'tq': 'Clear to cloudy to multi-cloud clear'}, {'time ': '2017-01-25 09:00:00 ', 'temp': '-8', 'humi': '21', 'wse': '3', 'wd ': 'northeast Wind', 'tq': 'Clear to cloudy with clear'}, {'time': '2017-01-25 10:00:00 ', 'temp': '-7 ', 'humi': '19', 'wse': '3', 'wd ': 'northeast Wind', 'tq': 'Clear to cloudy with clear '}, {'time': '2017-01-25 11:00:00 ', 'temp': '-6', 'humi': '18', 'wse': '3 ', 'wd ': 'northeast Wind', 'tq': 'Cloudy'}, {'time': '2017-01-25 12:00:00 ', 'temp': '-6 ', 'humi': '17', 'wse': '3', 'wd ': 'northeast Wind', 'tq': 'Cloudy'}, {'time ': '2017-01-25 13:00:00 ', 'temp': '-5', 'humi': '17', 'wse': '2', 'wd ': 'northeast Wind', 'tq': 'Cloudy '}, {'time': '2017-01-25 14:00:00', 'temp ':'-5', 'humi ': '16', 'wse': '2', 'wd ': 'dongfeng', 'tq': 'Cloudy '}, {'time ': '2017-01-25 15:00:00 ', 'temp': '-5', 'humi': '15', 'wse': '2', 'wd ': 'northwind ', 'qa': 'Cloudy'}, {'time': '2017-01-25 16:00:00 ', 'temp': '-5', 'humi ': '16', 'wse': '2', 'wd ': 'northeast Wind', 'tq': 'Cloudy'}, {'time ': '2017-01-25 17:00:00 ', 'temp': '-5', 'humi': '16', 'wse': '2', 'wd ': 'dongfeng ', 'tq': 'Cloudy'}, {'time': '2017-01-25 18:00:00 ', 'temp': '-6', 'humi ': '18', 'wse': '2', 'wd ': 'dongfeng', 'tq': 'sunny multi-cloud '}, {'time ': '2017-01-25 19:00:00 ', 'temp': '-7', 'humi': '19', 'wse': '2', 'wd ': 'dongfeng ', 'tq': 'sunny and cloudy'}, {'time': '2017-01-25 20:00:00 ', 'temp': '-7 ', 'humi': '19', 'wse': '1', 'wd ': 'dongfeng', 'tq': 'sunny multi-cloud '}, {'time ': '2017-01-25 21:00:00 ', 'temp': '-7', 'humi': '19', 'wse': '0', 'wd ': 'southwind ', 'tq': 'sunny multi-cloud'}]}

Success!

In this way, we can get the temperature, humidity, wind power, weather and other information.

In addition, this part of data is not complete. For data such as PM 2.5 and AQI, you need to use another method parameter GETDETAIL to modify it to obtain this part of data.

The subsequent data will be parsed and stored. We will not repeat it here.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.