使用PyV8在Python爬蟲中執行js代碼,pyv8python
前言
可能很多人會覺得這是一個奇葩的需求,爬蟲去好好的爬資料不就行了,解析js幹嘛?吃飽了撐的?
搜尋一下互連網上關於這個問題還真不少,但是大多數童鞋是因為自己的js基礎太爛,要麼是HTML基礎爛,要麼ajax基礎爛,反正各方面都很爛。基礎這麼渣不好好去學基礎寫什麼爬蟲?
那你肯定要問了“請問我的朋友,你TM怎麼也有這個需求?莫非你是個技術渣?”
非也非也,博主作為一個擁有3年多前端經驗的攻城屍,怎麼會被這個問題給難倒呢,老夫今天遇到的問題很顯然沒有那麼簡單。
問題
那麼博主到底是遇到什麼問題了呢?
博主今天要去爬一個介面,但是調用那個介面需要帶上令牌,也就是儲存在Cookie中的一個類似token的東西,Cookie的值是一段js產生的,這段js又是通過另外一個介面擷取回來的,而擷取回來的js代碼還是動態,WTF!!!開發人員你這是 弄撒嘞?
路人甲:我擦嘞,聲稱經驗老道的博主不會分析js的邏輯?
對,我就是不會,特麼的js代碼都是混淆加密的,眼睛都看瞎了都特麼不知道寫的都是寫啥?
算了,我直接執行拿到結果就好了,管他寫的是什麼鬼。
思路
理一理思路,現在要做的事情其實很簡單
- 請求介面A,拿到動態產生的混淆過的js代碼
- 執行js代碼,拿到產生的cookie值
- 請求介面B,帶上js產生的令牌
- 拿到結果,愉快的玩耍...
思路相當的清晰,感覺秒秒鐘就可以實現了呢。()
難題
Python裡面執行js?有點意思,我幹嘛不用nodejs呢?
因為Python是世界上最屌的語言啊!沒有之一!
找到了PyV8這個神奇的模組,機器已經有了pip,執行安裝一下不就OK了?
pip install pyv8
不要懷疑,博主機器裝的是 Kali Linux ,Root 許可權,不需要 sudo
接著報錯
pip install -U PyV8Collecting PyV8 Using cached PyV8-0.5.zipBuilding wheels for collected packages: PyV8 Running setup.py bdist_wheel for PyV8 ... error Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-QUm4bX/PyV8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmpb0udlepip-wheel- --python-tag cp27: running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-2.7 copying PyV8.py -> build/lib.linux-x86_64-2.7 running build_ext building '_PyV8' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-cFt4xx/python2.7-2.7.12=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DBOOST_PYTHON_STATIC_LIB -Ilib/python/inc -Ilib/boost/inc -Ilib/v8/inc -I/usr/include/python2.7 -c src/Exception.cpp -o build/temp.linux-x86_64-2.7/src/Exception.o cc1plus: warning: command line option ‘-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from src/Exception.cpp:1:0: src/Exception.h:6:16: fatal error: v8.h: 沒有那個檔案或目錄 #include <v8.h> ^ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ---------------------------------------- Failed building wheel for PyV8 Running setup.py clean for PyV8Failed to build PyV8Installing collected packages: PyV8 Running setup.py install for PyV8 ... error Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-QUm4bX/PyV8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-7OAwUa-record/install-record.txt --single-version-externally-managed --compile: running install running build running build_py creating build creating build/lib.linux-x86_64-2.7 copying PyV8.py -> build/lib.linux-x86_64-2.7 running build_ext building '_PyV8' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-cFt4xx/python2.7-2.7.12=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DBOOST_PYTHON_STATIC_LIB -Ilib/python/inc -Ilib/boost/inc -Ilib/v8/inc -I/usr/include/python2.7 -c src/Exception.cpp -o build/temp.linux-x86_64-2.7/src/Exception.o cc1plus: warning: command line option ‘-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from src/Exception.cpp:1:0: src/Exception.h:6:16: fatal error: v8.h: 沒有那個檔案或目錄 #include <v8.h> ^ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ----------------------------------------Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-QUm4bX/PyV8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-7OAwUa-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-QUm4bX/PyV8/
似乎是因為缺少 v8.h 這個檔案導致的,可是又看不懂啥意思。
解決
通過搜尋引擎找到瞭解決方案,原來是因為 PyV8 依賴於Boost ,然而這個問題官方並沒有說,所以得先安裝下這個包
apt-get update && apt-get install libboost-all-dev
安裝完成之後繼續安裝 PyV8 ,依然是上面同樣的問題,看來只能手動來了。
下載 https://github.com/emmetio/pyv8-binaries
解壓並選擇合適自己系統內容的檔案,再次解壓 並把解壓得到的檔案複製到
/usr/lib/python2.7/dist-packages/
裡面去,然後測試看是否成功,終端執行
pythonimport PyV8
如果沒有報錯,那就成功了,開始愉快的玩耍,下面是我需要解析的js代碼
var l = [119, 98, 115, 33, 111, 109, 120, 105, 118, 62, 92, 50, 50, 54, 45, 50, 50, 51, 45, 50, 50, 55, 45, 50, 49, 58, 45, 50, 50, 49, 45, 50, 51, 51, 45, 50, 50, 52, 45, 50, 50, 51, 45, 50, 50, 54, 45, 50, 49, 55, 45, 50, 49, 58, 45, 50, 49, 50, 45, 50, 50, 54, 45, 50, 50, 58, 45, 50, 50, 49, 45, 50, 50, 51, 45, 50, 50, 58, 45, 50, 51, 51, 45, 50, 50, 58, 45, 50, 50, 55, 45, 50, 50, 54, 45, 50, 50, 54, 94, 60, 119, 98, 115, 33, 121, 119, 99, 100, 108, 62, 92, 49, 45, 51, 50, 45, 53, 45, 55, 45, 50, 50, 45, 57, 45, 56, 45, 50, 51, 45, 51, 45, 51, 49, 45, 50, 52, 45, 50, 54, 45, 50, 49, 45, 50, 57, 45, 52, 45, 58, 45, 50, 53, 45, 50, 56, 45, 54, 45, 50, 55, 45, 50, 58, 45, 50, 94, 60, 119, 98, 115, 33, 118, 62, 35, 35, 60, 103, 112, 115, 33, 41, 119, 62, 49, 60, 119, 61, 121, 119, 99, 100, 108, 47, 109, 102, 111, 104, 117, 105, 60, 119, 44, 44, 42, 124, 118, 44, 62, 84, 117, 115, 106, 111, 104, 47, 103, 115, 112, 110, 68, 105, 98, 115, 68, 112, 101, 102, 41, 111, 109, 120, 105, 118, 92, 121, 119, 99, 100, 108, 92, 119, 94, 94, 42, 126, 60, 37, 47, 100, 112, 112, 108, 106, 102, 41, 40, 114, 117, 112, 108, 102, 111, 40, 45, 118, 45, 124, 113, 98, 117, 105, 59, 40, 48, 40, 126, 42, 60];eval(function(p, a, c, k, e, d) { e = function(c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)) }; if (!''.replace(/^/, String)) { while (c--) d[e(c)] = k[c] || e(c); k = [function(e) { return d[e] }]; e = function() { return '\\w+' }; c = 1 }; while (c--) if (k[c]) p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c]); return p}('6 3=\'\';7(2=0;2<4.5;2++){3+=8.a(4[2]-1)};9(3)', 11, 11, '||i|t|l|length|var|for|String|eval|fromCharCode'.split('|'), 0, {}))
已經經過整理,其實剛開始就只有一行,比較尷尬
姿勢
折騰的過程可謂是各種曲折,不過也學到了不少姿勢,比如,如何把混淆的js還原成原始代碼
使用Firebug外掛程式就能輕鬆解決這個問題,開啟firebug外掛程式,找到指令碼選項,選擇帶 eval 的項,一般解析到最後一行就是原始代碼了,我上面的那段 js 還原之後就便成了這個樣子
var balwi=[115,116,115,122,112,115,110,106,122,110,122,112,101,119,115,106,113,101,116,116,119,106];var ljpry=[15,21,4,9,12,14,11,0,18,20,8,16,7,2,1,10,17,13,19,6,5,3];var j="";for (k=0;k<ljpry.length;k++){j+=String.fromCharCode(balwi[ljpry[k]])};$.cookie('qtoken',j,{path:'/'});
稍微整理一下得到一個格式清晰的代碼
var balwi = [115, 116, 115, 122, 112, 115, 110, 106, 122, 110, 122, 112, 101, 119, 115, 106, 113, 101, 116, 116, 119, 106];var ljpry = [15, 21, 4, 9, 12, 14, 11, 0, 18, 20, 8, 16, 7, 2, 1, 10, 17, 13, 19, 6, 5, 3];var j = "";for (k = 0; k < ljpry.length; k++) { j += String.fromCharCode(balwi[ljpry[k]])};$.cookie('qtoken', j, { path: '/'});
有了原始代碼就很容易得到令牌的產生演算法,使用Python產生,這回不用麻煩 PyV8 大神出馬了。
總結
以上就是這篇文章的全部內容了,希望本文的內容對大家的學習或者工作能帶來一定的協助,如果有疑問大家可以留言交流。