Preface
Probably a lot of people will think this is a wonderful demand, crawler to good crawl data is not on the line, parsing JS why? You got enough to eat?
Search the Internet about this problem is quite a lot, but most of the children's shoes because their JS Foundation is too rotten, either the HTML base rotten, or the Ajax Foundation rotten, anyway all are very rotten. Base so slag bad good to learn the basis of what crawler?
Then you must ask, "my friend, how do you have this demand?" Are you a technical slag? ”
Not also, Bo Master as a front-end experience of more than 3 years of siege corpse, how can be baffled by this problem, the problem that I encountered today is obviously not so simple.
Problem
So what's the problem with bloggers?
Bloggers are going to climb an interface today, but call that interface need to bring a token, that is, stored in a cookie like token, the value of a cookie is a JS generated, this JS is another interface to get back, and get back the JS code is still dynamic, wtf!!! Developer, are you messing with this?
Passers-by: I wiped, claiming that experienced bloggers will not analyze the logic of JS?
Yes, I just can't, special JS code are confused encryption, eyes are blind Durt do not know what is written to write?
Well, I'll just execute it and get the results, whatever the hell he wrote.
Ideas
Think about it, the things you need to do now are really simple.
Request interface A, get the dynamically generated garbled JS code
Execute JS code to get the generated cookie value
Request interface B, with JS generated tokens
Get results, happy play ...
The idea is quite clear, feel second seconds can be realized. ()
Problem
Python inside the execution js? What do you mean, why don't I nodejs?
Because Python is the world's most cock language Ah! No one!
Found the PyV8 this magical module, the machine has a PIP, perform the installation is not OK?
Pip Install Pyv8
Do not suspect that the Bo master machine is Kali Linux, Root permissions, do not require sudo
Then error
Pip install-u pyv8collecting PyV8 Using cached pyv8-0.5.zipbuilding wheels for collected packages:pyv8 Running setup.py Bdist_wheel for PyV8 ... error complete output from Command/usr/bin/python-u-C "Import setuptools, tokenize;__file__= '/ Tmp/pip-build-qum4bx/pyv8/setup.py '; Exec (Compile (getattr (tokenize, ' open ', open) (__file__). Read (). replace (' \ r \ n ' , ' \ n '), __file__, ' exec ') "bdist_wheel-d/tmp/tmpb0udlepip-wheel---python-tag cp27:running Bdist_wheel running Build Running build_py creating build creating build/lib.linux-x86_64-2.7 copying pyv8.py build/lib.linux-x86_64-2.7 Running Build_ext building ' _pyv8 ' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7 /SRC x86_64-linux-gnu-gcc-pthread-dndebug-g-fwrapv-o2-wall-wstrict-prototypes-fno-strict-aliasing-wdate-time-d_ Fortify_source=2-g-fdebug-prefix-map=/build/python2.7-cft4xx/python2.7-2.7.12=. -fstack-protector-strong-wformat-werror=format-security-fpic-dboost_python_statiC_lib-ilib/python/inc-ilib/boost/inc-ilib/v8/inc-i/usr/include/python2.7-c Src/exception.cpp-o build/ TEMP.LINUX-X86_64-2.7/SRC/EXCEPTION.O cc1plus:warning:command line option '-wstrict-prototypes ' was valid for C/OBJC but Not for C + + in file included from Src/exception.cpp:1:0:src/exception.h:6:16:fatal error:v8.h: No file or directory #include < v8.h> ^ compilation terminated. Error:command ' X86_64-LINUX-GNU-GCC ' failed with exit status 1----------------------------------------failed building Wheel for PyV8 Running setup.py clean for pyv8failed to build pyv8installing collected Packages:pyv8 Running setup.py in Stall for PyV8 ... error complete output from Command/usr/bin/python-u-C "Import setuptools, tokenize;__file__= '/tmp/pi P-build-qum4bx/pyv8/setup.py '; Exec (Compile (getattr (tokenize, ' open ', open) (__file__). Read (). replace (' \ r \ n ', ' \ n ') , __file__, ' exec ')) "Install--record/tmp/pip-7oawua-record/install-record.txt--single-version-externally-managed --compile:runniNg install running build running build_py creating build creating build/lib.linux-x86_64-2.7 copying pyv8.py build/l ib.linux-x86_64-2.7 running Build_ext building ' _pyv8 ' extension creating build/temp.linux-x86_64-2.7 creating build/ TEMP.LINUX-X86_64-2.7/SRC x86_64-linux-gnu-gcc-pthread-dndebug-g-fwrapv-o2-wall-wstrict-prototypes- Fno-strict-aliasing-wdate-time-d_fortify_source=2-g-fdebug-prefix-map=/build/python2.7-cft4xx/python2.7-2.7.12 =. -fstack-protector-strong-wformat-werror=format-security-fpic-dboost_python_static_lib-ilib/python/inc-ilib/ Boost/inc-ilib/v8/inc-i/usr/include/python2.7-c Src/exception.cpp-o BUILD/TEMP.LINUX-X86_64-2.7/SRC/EXCEPTION.O Cc1plus:warning:command line option '-wstrict-prototypes ' was valid for C/OBJC and not for C + + in file included from src/ Exception.cpp:1:0:src/exception.h:6:16:fatal error:v8.h: There is no file or directory #include <v8.h> ^ compilation terminated. Error:command ' X86_64-LINUX-GNU-GCC ' failed with exit status 1 ----------------------------------------Command "/usr/bin/python-u-C" Import setuptools, tokenize;__file__= '/tmp/ Pip-build-qum4bx/pyv8/setup.py '; Exec (Compile (getattr (tokenize, ' open ', open) (__file__). Read (). replace (' \ r \ n ', ' \ n '), __file__, ' exec ') ' Install--record/tmp/pip-7oawua-record/install-record.txt-- Single-version-externally-managed--compile "failed with error code 1 in/tmp/pip-build-qum4bx/pyv8/
Seems to be because of the lack of v8.h this file caused, but can not understand what meaning.
Solve
Through the search engine found the solution, originally because PyV8 relies on boost, but this problem officially did not say, so we have to install the package
Apt-get Update && apt-get install Libboost-all-dev
Installation after the completion of the installation of PyV8, still the same problem, it appears that only manually.
Download http://www.php.cn/
Unzip and select the appropriate files for your system environment, unzip again and copy the extracted files to
/usr/lib/python2.7/dist-packages/
Go inside, then test to see if it succeeds, the terminal executes
Pythonimport PyV8
If there is no error, then success, began to play happily, here is the JS code I need to parse
var L = [119, 98, 115, 33, 111, 109, 120, 105, 118, 62, 92, 50, 50, 54, 45, 50, 50, 51, 45, 50, 50, 55, 45, 50, 49, 58, 45 , 50, 50, 49, 45, 50, 51, 51, 45, 50, 50, 52, 45, 50, 50, 51, 45, 50, 50, 54, 45, 50, 49, 55, 45, 50, 49, 58, 45, 50, 49, 50, 45, 50, 50, 54, 45, 50, 50, 58, 45, 50, 50, 49, 45, 50, 50, 51, 45, 50, 50, 58, 45, 50, 51, 51, 45, 50, 50 , 50, 55, 45, 50, 50, 54, 45, 50, 50, 54, 94, 60, 119, 98, 115, 33, 121, 119, 99, 100, 108, 62, 92, 49, 45, 51, 50, 45, 53 , 45, 55, 45, 50, 50, 45, 57, 45, 56, 45, 50, 51, 45, 51, 45, 51, 49, 45, 50, 52, 45, 50, 54, 45, 50, 49, 45, 50, 57, 45, 52, 45, 58, 45, 50, 53, 45, 50, 56, 45, 54, 45, 50, 55, 45, 50, 58, 45, 50, 94, 60, 119, 98, 115, 33, 118, 62, 35, 35, 60, 103, 112, 115, 33, 41, 119, 62, 49, 60, 119, 61, 121, 119, 99, 100, 108, 47, 109, 102, 111, 104, 117, 105, 60, 119, 44, 4 4, 42, 124, 118, 44, 62, 84, 117, 115, 106, 111, 104, 47, 103, 115, 112, 110, 68, 105, 98, 115, 68, 112, 101, 102, 41, 111 , 109, 120, 105, 118, 92, 121, 119, 99, 100, 108, 92, 119, 94, 94, 42, 126, 60, 37, 47, 100, 112, 112, 108, 106, 102, 41, 40, 114, 117, 112, 108, 102, 111, K, D, 118, 124, 113, 98, 117,, K, K, D, J, 126, 60];eval (function (p, A, C, K., E, d) {e = func tion (c) {return (C < a?) "": E (parseint (C/A)) + ((c = c% a) > 35? String.fromCharCode (c +): C.tostring (36))}; if (! '). Replace (/^/, String)) {while (c--) D[e (c)] = K[c] | | e (c); k = [function (e) {return d[e]}]; e = function () {return ' \ \ \ W+ '}; c = 1}; while (c--) if (k[c]) p = p.replace (new RegExp (' \\b ' + E (c) + ' \\b ', ' G '), k[c]); Return P} (' 6 3=\ '; 7 (2=0;2<4.5;2++) {3+=8.a (4[2]-1)};9 (3) ', 11, 11, ' | | i|t|l|length|var|for| String|eval|fromcharcode '. Split (' | '), 0, {}))
has been organized, in fact, just the beginning of a row, more awkward
Posture
Toss the process is a variety of twists and turns, but also learned a lot of posture, for example, how to reduce the confusion of JS into the original code
Using the Firebug plug-in can easily solve this problem, open the Firebug plug-in, Find script options, select the item with eval, general parsing to the last line is the original code, I above the JS restore after the handy became this look
var Balwi=[115,116,115,122,112,115,110,106,122,110,122,112,101,119,115,106,113,101,116,116,119,106];var ljpry=[ 15,21,4,9,12,14,11,0,18,20,8,16,7,2,1,10,17,13,19,6,5,3];var j= ""; for (k=0;k<ljpry.length;k++) {j+= String.fromCharCode (Balwi[ljpry[k]])};$.cookie (' Qtoken ', J,{path: '/'});
Sort it out a little bit. Get a clear, well-formed code
var Balwi = [115, 116, 115, 122, 112, 115, 110, 106, 122, 110, 122, 112, 101, 119, 115, 106, 113, 101, 116, 116, 119, 106] var ljpry = [4, 9, 7, 0, 8, 2, 1, 6, 5,, K, +, +, +--------+----------- K < Ljpry.length; k++) {J + = String.fromCharCode (Balwi[ljpry[k])};$.cookie (' Qtoken ', J, {path: '/'});
With the original code it is easy to get the token generation algorithm, using Python generation, this time without trouble PyV8 big God.