Use PyV8 to execute JS code in a python crawler

Source: Internet
Author: User
Tags kali linux
Preface

Probably a lot of people will think this is a wonderful demand, crawler to good crawl data is not on the line, parsing JS why? You got enough to eat?

Search the Internet about this problem is quite a lot, but most of the children's shoes because their JS Foundation is too rotten, either the HTML base rotten, or the Ajax Foundation rotten, anyway all are very rotten. Base so slag bad good to learn the basis of what crawler?

Then you must ask, "my friend, how do you have this demand?" Are you a technical slag? ”

Not also, Bo Master as a front-end experience of more than 3 years of siege corpse, how can be baffled by this problem, the problem that I encountered today is obviously not so simple.

Problem

So what's the problem with bloggers?

Bloggers are going to climb an interface today, but call that interface need to bring a token, that is, stored in a cookie like token, the value of a cookie is a JS generated, this JS is another interface to get back, and get back the JS code is still dynamic, wtf!!! Developer, are you messing with this?

Passers-by: I wiped, claiming that experienced bloggers will not analyze the logic of JS?

Yes, I just can't, special JS code are confused encryption, eyes are blind Durt do not know what is written to write?

Well, I'll just execute it and get the results, whatever the hell he wrote.

Ideas

Think about it, the things you need to do now are really simple.

    1. Request interface A, get the dynamically generated garbled JS code

    2. Execute JS code to get the generated cookie value

    3. Request interface B, with JS generated tokens

    4. Get results, happy play ...

The idea is quite clear, feel second seconds can be realized. ()

Problem

Python inside the execution js? What do you mean, why don't I nodejs?

Because Python is the world's most cock language Ah! No one!

Found the PyV8 this magical module, the machine has a PIP, perform the installation is not OK?

Pip Install Pyv8

Do not suspect that the Bo master machine is Kali Linux, Root permissions, do not require sudo

Then error

Pip install-u pyv8collecting PyV8 Using cached pyv8-0.5.zipbuilding wheels for collected packages:pyv8 Running setup.py Bdist_wheel for PyV8 ... error complete output from Command/usr/bin/python-u-C "Import setuptools, tokenize;__file__= '/ Tmp/pip-build-qum4bx/pyv8/setup.py '; Exec (Compile (getattr (tokenize, ' open ', open) (__file__). Read (). replace (' \ r \ n ' , ' \ n '), __file__, ' exec ') "bdist_wheel-d/tmp/tmpb0udlepip-wheel---python-tag cp27:running Bdist_wheel running Build Running build_py creating build creating build/lib.linux-x86_64-2.7 copying pyv8.py build/lib.linux-x86_64-2.7 Running Build_ext building ' _pyv8 ' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7 /SRC x86_64-linux-gnu-gcc-pthread-dndebug-g-fwrapv-o2-wall-wstrict-prototypes-fno-strict-aliasing-wdate-time-d_ Fortify_source=2-g-fdebug-prefix-map=/build/python2.7-cft4xx/python2.7-2.7.12=. -fstack-protector-strong-wformat-werror=format-security-fpic-dboost_python_statiC_lib-ilib/python/inc-ilib/boost/inc-ilib/v8/inc-i/usr/include/python2.7-c Src/exception.cpp-o build/  TEMP.LINUX-X86_64-2.7/SRC/EXCEPTION.O cc1plus:warning:command line option '-wstrict-prototypes ' was valid for C/OBJC but Not for C + + in file included from Src/exception.cpp:1:0:src/exception.h:6:16:fatal error:v8.h: No file or directory #include < v8.h> ^ compilation terminated.  Error:command ' X86_64-LINUX-GNU-GCC ' failed with exit status 1----------------------------------------failed building Wheel for PyV8 Running setup.py clean for pyv8failed to build pyv8installing collected Packages:pyv8 Running setup.py in Stall for PyV8 ... error complete output from Command/usr/bin/python-u-C "Import setuptools, tokenize;__file__= '/tmp/pi P-build-qum4bx/pyv8/setup.py '; Exec (Compile (getattr (tokenize, ' open ', open) (__file__). Read (). replace (' \ r \ n ', ' \ n ') , __file__, ' exec ')) "Install--record/tmp/pip-7oawua-record/install-record.txt--single-version-externally-managed --compile:runniNg install running build running build_py creating build creating build/lib.linux-x86_64-2.7 copying pyv8.py build/l ib.linux-x86_64-2.7 running Build_ext building ' _pyv8 ' extension creating build/temp.linux-x86_64-2.7 creating build/ TEMP.LINUX-X86_64-2.7/SRC x86_64-linux-gnu-gcc-pthread-dndebug-g-fwrapv-o2-wall-wstrict-prototypes- Fno-strict-aliasing-wdate-time-d_fortify_source=2-g-fdebug-prefix-map=/build/python2.7-cft4xx/python2.7-2.7.12 =. -fstack-protector-strong-wformat-werror=format-security-fpic-dboost_python_static_lib-ilib/python/inc-ilib/ Boost/inc-ilib/v8/inc-i/usr/include/python2.7-c Src/exception.cpp-o BUILD/TEMP.LINUX-X86_64-2.7/SRC/EXCEPTION.O Cc1plus:warning:command line option '-wstrict-prototypes ' was valid for C/OBJC and not for C + + in file included from src/  Exception.cpp:1:0:src/exception.h:6:16:fatal error:v8.h: There is no file or directory #include <v8.h> ^ compilation terminated. Error:command ' X86_64-LINUX-GNU-GCC ' failed with exit status 1  ----------------------------------------Command "/usr/bin/python-u-C" Import setuptools, tokenize;__file__= '/tmp/ Pip-build-qum4bx/pyv8/setup.py '; Exec (Compile (getattr (tokenize, ' open ', open) (__file__). Read (). replace (' \ r \ n ', ' \ n '), __file__, ' exec ') ' Install--record/tmp/pip-7oawua-record/install-record.txt-- Single-version-externally-managed--compile "failed with error code 1 in/tmp/pip-build-qum4bx/pyv8/

Seems to be because of the lack of v8.h this file caused, but can not understand what meaning.

Solve

Through the search engine found the solution, originally because PyV8 relies on boost, but this problem officially did not say, so we have to install the package

Apt-get Update && apt-get install Libboost-all-dev

Installation after the completion of the installation of PyV8, still the same problem, it appears that only manually.

Download http://www.php.cn/

Unzip and select the appropriate files for your system environment, unzip again and copy the extracted files to

/usr/lib/python2.7/dist-packages/

Go inside, then test to see if it succeeds, the terminal executes

Pythonimport PyV8

If there is no error, then success, began to play happily, here is the JS code I need to parse

var L = [119, 98, 115, 33, 111, 109, 120, 105, 118, 62, 92, 50, 50, 54, 45, 50, 50, 51, 45, 50, 50, 55, 45, 50, 49, 58, 45 , 50, 50, 49, 45, 50, 51, 51, 45, 50, 50, 52, 45, 50, 50, 51, 45, 50, 50, 54, 45, 50, 49, 55, 45, 50, 49, 58, 45, 50, 49, 50, 45, 50, 50, 54, 45, 50, 50, 58, 45, 50, 50, 49, 45, 50, 50, 51, 45, 50, 50, 58, 45, 50, 51, 51, 45, 50, 50 , 50, 55, 45, 50, 50, 54, 45, 50, 50, 54, 94, 60, 119, 98, 115, 33, 121, 119, 99, 100, 108, 62, 92, 49, 45, 51, 50, 45, 53 , 45, 55, 45, 50, 50, 45, 57, 45, 56, 45, 50, 51, 45, 51, 45, 51, 49, 45, 50, 52, 45, 50, 54, 45, 50, 49, 45, 50, 57, 45,  52, 45, 58, 45, 50, 53, 45, 50, 56, 45, 54, 45, 50, 55, 45, 50, 58, 45, 50, 94, 60, 119, 98, 115, 33, 118, 62, 35, 35, 60, 103, 112, 115, 33, 41, 119, 62, 49, 60, 119, 61, 121, 119, 99, 100, 108, 47, 109, 102, 111, 104, 117, 105, 60, 119, 44, 4 4, 42, 124, 118, 44, 62, 84, 117, 115, 106, 111, 104, 47, 103, 115, 112, 110, 68, 105, 98, 115, 68, 112, 101, 102, 41, 111 , 109, 120, 105, 118, 92, 121, 119, 99, 100, 108, 92, 119, 94, 94, 42, 126, 60, 37, 47, 100, 112, 112, 108, 106, 102, 41, 40, 114, 117, 112, 108, 102, 111, K, D, 118, 124, 113, 98, 117,, K, K, D, J, 126, 60];eval (function (p, A, C, K., E, d) {e = func tion (c) {return (C < a?) "": E (parseint (C/A)) + ((c = c% a) > 35? String.fromCharCode (c +): C.tostring (36))}; if (! '). Replace (/^/, String)) {while (c--) D[e (c)] = K[c] | | e (c); k = [function (e) {return d[e]}]; e = function () {return ' \ \ \ W+ '}; c = 1}; while (c--) if (k[c]) p = p.replace (new RegExp (' \\b ' + E (c) + ' \\b ', ' G '), k[c]); Return P} (' 6 3=\ '; 7 (2=0;2<4.5;2++) {3+=8.a (4[2]-1)};9 (3) ', 11, 11, ' | | i|t|l|length|var|for| String|eval|fromcharcode '. Split (' | '), 0, {}))

has been organized, in fact, just the beginning of a row, more awkward

Posture

Toss the process is a variety of twists and turns, but also learned a lot of posture, for example, how to reduce the confusion of JS into the original code

Using the Firebug plug-in can easily solve this problem, open the Firebug plug-in, Find script options, select the item with eval, general parsing to the last line is the original code, I above the JS restore after the handy became this look

var Balwi=[115,116,115,122,112,115,110,106,122,110,122,112,101,119,115,106,113,101,116,116,119,106];var ljpry=[ 15,21,4,9,12,14,11,0,18,20,8,16,7,2,1,10,17,13,19,6,5,3];var j= ""; for (k=0;k<ljpry.length;k++) {j+= String.fromCharCode (Balwi[ljpry[k]])};$.cookie (' Qtoken ', J,{path: '/'});

Sort it out a little bit. Get a clear, well-formed code

var Balwi = [115, 116, 115, 122, 112, 115, 110, 106, 122, 110, 122, 112, 101, 119, 115, 106, 113, 101, 116, 116, 119, 106] var ljpry = [4, 9, 7, 0, 8, 2, 1, 6, 5,, K, +, +, +--------+----------- K < Ljpry.length; k++) {J + = String.fromCharCode (Balwi[ljpry[k])};$.cookie (' Qtoken ', J, {path: '/'});

With the original code it is easy to get the token generation algorithm, using Python generation, this time without trouble PyV8 big God.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.