Use PyV8 to execute js code in Python crawler

Source: Internet
Author: User
Tags kali linux
PyV8 is the engine chrome uses to execute javascript. it is said to be the fastest js engine. it can be used in python through pyv8 encapsulation. The following article describes how to use PyV8 to execute js code in Python crawlers. For more information, see. Preface

A lot of people may think this is an amazing demand. it's not enough for crawlers to crawl data. what should they do with parsing JavaScript? Full?

There are quite a few questions about this issue on the Internet, but most of my shoes are poor because of their own js infrastructure, either HTML or ajax infrastructure, which is poor in all aspects. Why is it difficult to learn basic crawler?

Then you must have asked, "my friend, how do you have this demand for TM? Are you a technical scum ?"

Neither is it nor is it. how can a blogger, as an attacker with more than three years of front-end experience, be overwhelmed by this problem? the problems that old man encountered today are obviously not that simple.

Problem

So what's the problem with the blogger?

The blogger is going to crawl an interface today, but calling that interface requires a token, that is, a token-like thing stored in the Cookie. the Cookie value is generated by js, this js code is retrieved through another interface, and the js code is dynamic, WTF !!! Developer, are you kidding me?

Passerby A: I wiped the answer and claimed that experienced bloggers would not analyze the js logic?

Yes, I just don't. the special JavaScript code is obfuscated and encrypted, and the eyes are blind. don't you know what to write?

Forget it. I just need to execute the command and get the result.

Ideas

Let's take a look. what we need to do now is actually very simple.

  1. Request interface A to obtain the dynamically generated obfuscated js code

  2. Run the js code to obtain the generated cookie value.

  3. Request interface B with the token generated by js

  4. Get the results and have fun...

The idea is quite clear, and it will take seconds to complete. ()

Difficulties

Execute js in Python? Why don't I use nodejs?

Because Python is the best language in the world! None!

I found the magical module PyV8, and the machine already has pip. Is it okay to execute the installation?

pip install pyv8

Do not doubt that the VM is installed with Kali Linux, Root permission, and sudo is not required.

Next, an error is reported.

Pip install-U PyV8Collecting PyV8 Using cached PyV8-0.5.zipBuilding wheels for collected packages: PyV8 Running setup. py bdist_wheel for PyV8... error Complete output from command/usr/bin/python-u-c "import setuptools, tokenize ;__ file __= '/tmp/pip-build-QUm4bX/PyV8/setup. PY'; exec (compile (getattr (tokenize, 'open', open) (_ file __). read (). replace ('\ r \ n',' \ n'), _ file __, 'exec ') "bdist_wheel-d/tmp/tmpb0udlepip-wheel--- python-tag cp27: running bdist_wheel running build running build_py creating build/lib. linux-x86_64-2.7 copying PyV8.py-> build/lib. linux-x86_64-2.7 running build_ext building '_ PyV8 'extension creating build/temp. linux-x86_64-2.7 creating build/temp. linux-x86_64-2.7/src x86_64-linux-gnu-gcc-pthread-DNDEBUG-g-fwrapv-O2-Wall-Wstrict-prototypes-fno-strict-aliasing-Wdate-time-D_FORTIFY_SOURCE = 2-g-fdebug-prefix -map =/build/python2.7-cFt4xx/python2.7-2.7.12 =. -fstack-protector-strong-Wformat-Werror = format-security-fPIC-DBOOST_PYTHON_STATIC_LIB-Ilib/python/inc-Ilib/boost/inc-Ilib/v8/inc-I/usr /include/python2.7-c src/Exception. cpp-o build/temp. linux-x86_64-2.7/src/Exception. o cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C ++ In file stored ded from src/Exception. cpp: 1: 0: src/Exception. h: 6: 16: fatal error: v8.h: no file or directory # include
 
  
^ Compilation terminated. error: command 'x86 _ 64-linux-gnu-gcc 'failed with exit status 1 ------------------------------------------ Failed building wheel for PyV8 Running setup. py clean for PyV8Failed to build PyV8Installing collected packages: PyV8 Running setup. py install for PyV8... error Complete output from command/usr/bin/python-u-c "import setuptools, tokenize ;__ file __= '/tmp/pip-build-QUm4bX/PyV8/setup. PY'; exec (compile (getattr (tokenize, 'open', open) (_ file __). read (). replace ('\ r \ n',' \ n'), _ file __, 'exec ')) "install -- record/tmp/pip-7OAwUa-record/install-record.txt -- single-version-externally-managed -- compile: running install running build running build_py creating build/lib. linux-x86_64-2.7 copying PyV8.py-> build/lib. linux-x86_64-2.7 running build_ext building '_ PyV8 'extension creating build/temp. linux-x86_64-2.7 creating build/temp. linux-x86_64-2.7/src x86_64-linux-gnu-gcc-pthread-DNDEBUG-g-fwrapv-O2-Wall-Wstrict-prototypes-fno-strict-aliasing-Wdate-time-D_FORTIFY_SOURCE = 2-g-fdebug-prefix -map =/build/python2.7-cFt4xx/python2.7-2.7.12 =. -fstack-protector-strong-Wformat-Werror = format-security-fPIC-DBOOST_PYTHON_STATIC_LIB-Ilib/python/inc-Ilib/boost/inc-Ilib/v8/inc-I/usr /include/python2.7-c src/Exception. cpp-o build/temp. linux-x86_64-2.7/src/Exception. o cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C ++ In file stored ded from src/Exception. cpp: 1: 0: src/Exception. h: 6: 16: fatal error: v8.h: no file or directory # include
  
   
^ Compilation terminated. error: command 'x86 _ 64-linux-gnu-gcc 'failed with exit status 1 -------------------------------------------- Command "/usr/bin/python-u-c" import setuptools, tokenize; _ file __= '/tmp/pip-build-QUm4bX/PyV8/setup. PY'; exec (compile (getattr (tokenize, 'open', open) (_ file __). read (). replace ('\ r \ n',' \ n'), _ file __, 'exec ')) "install -- record/tmp/pip-7OAwUa-record/install-record.txt -- single-version-externally-managed -- compile" failed with error code 1 in/tmp/pip-build-QUm4bX/PyV8/
  
 

It seems that the file v8.h is missing, but you cannot understand what it means.

Solution

The solution was found through the search engine. it turned out that PyV8 relied on Boost. However, this problem was not officially mentioned, so we had to install this package first.

apt-get update && apt-get install libboost-all-dev

After the installation is complete, continue to install PyV8, which is still the same problem above. it seems that you can only install it manually.

Download #

Decompress and select a file suitable for your system environment, decompress it again, and copy the extracted file

/usr/lib/python2.7/dist-packages/

And then test whether the operation is successful.

pythonimport PyV8

If no error is reported, it will succeed and start to play happily. below is the js code that I need to parse

var l = [119, 98, 115, 33, 111, 109, 120, 105, 118, 62, 92, 50, 50, 54, 45, 50, 50, 51, 45, 50, 50, 55, 45, 50, 49, 58, 45, 50, 50, 49, 45, 50, 51, 51, 45, 50, 50, 52, 45, 50, 50, 51, 45, 50, 50, 54, 45, 50, 49, 55, 45, 50, 49, 58, 45, 50, 49, 50, 45, 50, 50, 54, 45, 50, 50, 58, 45, 50, 50, 49, 45, 50, 50, 51, 45, 50, 50, 58, 45, 50, 51, 51, 45, 50, 50, 58, 45, 50, 50, 55, 45, 50, 50, 54, 45, 50, 50, 54, 94, 60, 119, 98, 115, 33, 121, 119, 99, 100, 108, 62, 92, 49, 45, 51, 50, 45, 53, 45, 55, 45, 50, 50, 45, 57, 45, 56, 45, 50, 51, 45, 51, 45, 51, 49, 45, 50, 52, 45, 50, 54, 45, 50, 49, 45, 50, 57, 45, 52, 45, 58, 45, 50, 53, 45, 50, 56, 45, 54, 45, 50, 55, 45, 50, 58, 45, 50, 94, 60, 119, 98, 115, 33, 118, 62, 35, 35, 60, 103, 112, 115, 33, 41, 119, 62, 49, 60, 119, 61, 121, 119, 99, 100, 108, 47, 109, 102, 111, 104, 117, 105, 60, 119, 44, 44, 42, 124, 118, 44, 62, 84, 117, 115, 106, 111, 104, 47, 103, 115, 112, 110, 68, 105, 98, 115, 68, 112, 101, 102, 41, 111, 109, 120, 105, 118, 92, 121, 119, 99, 100, 108, 92, 119, 94, 94, 42, 126, 60, 37, 47, 100, 112, 112, 108, 106, 102, 41, 40, 114, 117, 112, 108, 102, 111, 40, 45, 118, 45, 124, 113, 98, 117, 105, 59, 40, 48, 40, 126, 42, 60];eval(function(p, a, c, k, e, d) { e = function(c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)) }; if (!''.replace(/^/, String)) { while (c--) d[e(c)] = k[c] || e(c); k = [function(e) { return d[e] }]; e = function() { return '\\w+' }; c = 1 }; while (c--) if (k[c]) p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c]); return p}('6 3=\'\';7(2=0;2<4.5;2++){3+=8.a(4[2]-1)};9(3)', 11, 11, '||i|t|l|length|var|for|String|eval|fromCharCode'.split('|'), 0, {}))

I have already sorted it out. In fact, there is only one line at the beginning, which is embarrassing.

Posture

The tossing process can be described as a variety of twists and turns, but I have learned a lot about the posture, for example, how to restore the obfuscated JavaScript code to the original code

You can easily solve this problem by using the Firebug plug-in. open the firebug plug-in, find the script options, and select the items with eval. Generally, the last line is the original code, after I restored the above js file, it became like this.

Var balwi = [115,116,115,122,112,115,110,106,122,110,122,112,101,119,115,106,113,101,116,116,119,106]; var ljpry = [15, 21,]; var j = ""; for (k = 0; k
 
  

Sort out the code in a clear format.

var balwi = [115, 116, 115, 122, 112, 115, 110, 106, 122, 110, 122, 112, 101, 119, 115, 106, 113, 101, 116, 116, 119, 106];var ljpry = [15, 21, 4, 9, 12, 14, 11, 0, 18, 20, 8, 16, 7, 2, 1, 10, 17, 13, 19, 6, 5, 3];var j = "";for (k = 0; k < ljpry.length; k++) { j += String.fromCharCode(balwi[ljpry[k]])};$.cookie('qtoken', j, { path: '/'});

With the original code, you can easily get the token generation algorithm, which is generated using Python. this time, you don't have to bother PyV8.

For more articles about how to use PyV8 to execute js code in Python crawlers, refer to PHP!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.