使用PyV8在Python爬蟲中執行js代碼,pyv8python

來源:互聯網
上載者:User

使用PyV8在Python爬蟲中執行js代碼,pyv8python

前言

可能很多人會覺得這是一個奇葩的需求,爬蟲去好好的爬資料不就行了,解析js幹嘛?吃飽了撐的?

搜尋一下互連網上關於這個問題還真不少,但是大多數童鞋是因為自己的js基礎太爛,要麼是HTML基礎爛,要麼ajax基礎爛,反正各方面都很爛。基礎這麼渣不好好去學基礎寫什麼爬蟲?

那你肯定要問了“請問我的朋友,你TM怎麼也有這個需求?莫非你是個技術渣?”

非也非也,博主作為一個擁有3年多前端經驗的攻城屍,怎麼會被這個問題給難倒呢,老夫今天遇到的問題很顯然沒有那麼簡單。

問題

那麼博主到底是遇到什麼問題了呢?

博主今天要去爬一個介面,但是調用那個介面需要帶上令牌,也就是儲存在Cookie中的一個類似token的東西,Cookie的值是一段js產生的,這段js又是通過另外一個介面擷取回來的,而擷取回來的js代碼還是動態,WTF!!!開發人員你這是 弄撒嘞?

路人甲:我擦嘞,聲稱經驗老道的博主不會分析js的邏輯?

對,我就是不會,特麼的js代碼都是混淆加密的,眼睛都看瞎了都特麼不知道寫的都是寫啥?

算了,我直接執行拿到結果就好了,管他寫的是什麼鬼。

思路

理一理思路,現在要做的事情其實很簡單

  1. 請求介面A,拿到動態產生的混淆過的js代碼
  2. 執行js代碼,拿到產生的cookie值
  3. 請求介面B,帶上js產生的令牌
  4. 拿到結果,愉快的玩耍...

思路相當的清晰,感覺秒秒鐘就可以實現了呢。()

難題

Python裡面執行js?有點意思,我幹嘛不用nodejs呢?

因為Python是世界上最屌的語言啊!沒有之一!

找到了PyV8這個神奇的模組,機器已經有了pip,執行安裝一下不就OK了?

pip install pyv8

不要懷疑,博主機器裝的是 Kali Linux ,Root 許可權,不需要 sudo

接著報錯

pip install -U PyV8Collecting PyV8 Using cached PyV8-0.5.zipBuilding wheels for collected packages: PyV8 Running setup.py bdist_wheel for PyV8 ... error Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-QUm4bX/PyV8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmpb0udlepip-wheel- --python-tag cp27: running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-2.7 copying PyV8.py -> build/lib.linux-x86_64-2.7 running build_ext building '_PyV8' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-cFt4xx/python2.7-2.7.12=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DBOOST_PYTHON_STATIC_LIB -Ilib/python/inc -Ilib/boost/inc -Ilib/v8/inc -I/usr/include/python2.7 -c src/Exception.cpp -o build/temp.linux-x86_64-2.7/src/Exception.o cc1plus: warning: command line option ‘-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from src/Exception.cpp:1:0: src/Exception.h:6:16: fatal error: v8.h: 沒有那個檔案或目錄 #include <v8.h>     ^ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1  ---------------------------------------- Failed building wheel for PyV8 Running setup.py clean for PyV8Failed to build PyV8Installing collected packages: PyV8 Running setup.py install for PyV8 ... error Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-QUm4bX/PyV8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-7OAwUa-record/install-record.txt --single-version-externally-managed --compile: running install running build running build_py creating build creating build/lib.linux-x86_64-2.7 copying PyV8.py -> build/lib.linux-x86_64-2.7 running build_ext building '_PyV8' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-cFt4xx/python2.7-2.7.12=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DBOOST_PYTHON_STATIC_LIB -Ilib/python/inc -Ilib/boost/inc -Ilib/v8/inc -I/usr/include/python2.7 -c src/Exception.cpp -o build/temp.linux-x86_64-2.7/src/Exception.o cc1plus: warning: command line option ‘-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from src/Exception.cpp:1:0: src/Exception.h:6:16: fatal error: v8.h: 沒有那個檔案或目錄  #include <v8.h>     ^ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1  ----------------------------------------Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-QUm4bX/PyV8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-7OAwUa-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-QUm4bX/PyV8/

似乎是因為缺少 v8.h 這個檔案導致的,可是又看不懂啥意思。

解決

通過搜尋引擎找到瞭解決方案,原來是因為 PyV8 依賴於Boost ,然而這個問題官方並沒有說,所以得先安裝下這個包

apt-get update && apt-get install libboost-all-dev

安裝完成之後繼續安裝 PyV8 ,依然是上面同樣的問題,看來只能手動來了。

下載 https://github.com/emmetio/pyv8-binaries

解壓並選擇合適自己系統內容的檔案,再次解壓 並把解壓得到的檔案複製到

/usr/lib/python2.7/dist-packages/

裡面去,然後測試看是否成功,終端執行

pythonimport PyV8

如果沒有報錯,那就成功了,開始愉快的玩耍,下面是我需要解析的js代碼

var l = [119, 98, 115, 33, 111, 109, 120, 105, 118, 62, 92, 50, 50, 54, 45, 50, 50, 51, 45, 50, 50, 55, 45, 50, 49, 58, 45, 50, 50, 49, 45, 50, 51, 51, 45, 50, 50, 52, 45, 50, 50, 51, 45, 50, 50, 54, 45, 50, 49, 55, 45, 50, 49, 58, 45, 50, 49, 50, 45, 50, 50, 54, 45, 50, 50, 58, 45, 50, 50, 49, 45, 50, 50, 51, 45, 50, 50, 58, 45, 50, 51, 51, 45, 50, 50, 58, 45, 50, 50, 55, 45, 50, 50, 54, 45, 50, 50, 54, 94, 60, 119, 98, 115, 33, 121, 119, 99, 100, 108, 62, 92, 49, 45, 51, 50, 45, 53, 45, 55, 45, 50, 50, 45, 57, 45, 56, 45, 50, 51, 45, 51, 45, 51, 49, 45, 50, 52, 45, 50, 54, 45, 50, 49, 45, 50, 57, 45, 52, 45, 58, 45, 50, 53, 45, 50, 56, 45, 54, 45, 50, 55, 45, 50, 58, 45, 50, 94, 60, 119, 98, 115, 33, 118, 62, 35, 35, 60, 103, 112, 115, 33, 41, 119, 62, 49, 60, 119, 61, 121, 119, 99, 100, 108, 47, 109, 102, 111, 104, 117, 105, 60, 119, 44, 44, 42, 124, 118, 44, 62, 84, 117, 115, 106, 111, 104, 47, 103, 115, 112, 110, 68, 105, 98, 115, 68, 112, 101, 102, 41, 111, 109, 120, 105, 118, 92, 121, 119, 99, 100, 108, 92, 119, 94, 94, 42, 126, 60, 37, 47, 100, 112, 112, 108, 106, 102, 41, 40, 114, 117, 112, 108, 102, 111, 40, 45, 118, 45, 124, 113, 98, 117, 105, 59, 40, 48, 40, 126, 42, 60];eval(function(p, a, c, k, e, d) { e = function(c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)) }; if (!''.replace(/^/, String)) { while (c--) d[e(c)] = k[c] || e(c); k = [function(e) { return d[e] }]; e = function() { return '\\w+' }; c = 1 }; while (c--) if (k[c]) p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c]); return p}('6 3=\'\';7(2=0;2<4.5;2++){3+=8.a(4[2]-1)};9(3)', 11, 11, '||i|t|l|length|var|for|String|eval|fromCharCode'.split('|'), 0, {}))

已經經過整理,其實剛開始就只有一行,比較尷尬

姿勢

折騰的過程可謂是各種曲折,不過也學到了不少姿勢,比如,如何把混淆的js還原成原始代碼

使用Firebug外掛程式就能輕鬆解決這個問題,開啟firebug外掛程式,找到指令碼選項,選擇帶 eval 的項,一般解析到最後一行就是原始代碼了,我上面的那段 js 還原之後就便成了這個樣子

var balwi=[115,116,115,122,112,115,110,106,122,110,122,112,101,119,115,106,113,101,116,116,119,106];var ljpry=[15,21,4,9,12,14,11,0,18,20,8,16,7,2,1,10,17,13,19,6,5,3];var j="";for (k=0;k<ljpry.length;k++){j+=String.fromCharCode(balwi[ljpry[k]])};$.cookie('qtoken',j,{path:'/'});

稍微整理一下得到一個格式清晰的代碼

var balwi = [115, 116, 115, 122, 112, 115, 110, 106, 122, 110, 122, 112, 101, 119, 115, 106, 113, 101, 116, 116, 119, 106];var ljpry = [15, 21, 4, 9, 12, 14, 11, 0, 18, 20, 8, 16, 7, 2, 1, 10, 17, 13, 19, 6, 5, 3];var j = "";for (k = 0; k < ljpry.length; k++) { j += String.fromCharCode(balwi[ljpry[k]])};$.cookie('qtoken', j, { path: '/'});

有了原始代碼就很容易得到令牌的產生演算法,使用Python產生,這回不用麻煩 PyV8 大神出馬了。

總結

以上就是這篇文章的全部內容了,希望本文的內容對大家的學習或者工作能帶來一定的協助,如果有疑問大家可以留言交流。

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.