During the web crawler process, do you have some websites that are doing well in this aspect? You want to know which operations he has made such a good website through, the following is a detailed description of the relevant content of the article. I hope you will gain some benefits after browsing the following content. Javascript encryption verification processing for Python simulated web pages
Many people will encounter web crawlers. Some well-performed enterprise-level websites will encrypt user input for login or other operations and then post them to the server, encryption is almost done through JavaScript, so crawlers need to simulate the encryption processing.
I prefer Python for crawling. It is really convenient to use the urllib/urllib2 libraries. Therefore, you often need to simulate the JavaScript encryption process of your website in python. There are two methods to summarize:
The first is to rewrite the JavaScript code. This is nothing to say, but it only applies to small and not complex JavaScript code snippets. For example, there is a hidden post data on the website email page of Renren:
- <input type="hidden" name="biz" value=0 id="xn_biz"/>
The biz value on the page is 0, but it will become a string similar to 94347683291223928133 during post. Take a closer look at the source code of the page, we will find that the encrypted string of this verification is generated through a piece of JavaScript code: This is actually similar to a small verification encryption, Python simulates javascript encryption verification processing on the web page, it first produces a random segment of Text
- <script>var mREOQQ=’A`ZDu^`’;var VKMHX=’
^&+*L/~’;var uCHKAU=0;var rTIU;var wCJS=”;var
yAYH=Math.floor(VKMHX.length/2);while
(uCHKAU<mREOQQ.length){rTIU=mREOQQ.
charCodeAt(uCHKAU++);var aYDG=VKMHX.
charCodeAt(rTIU%VKMHX.length);aYDG=String.
fromCharCode(aYDG);if(aYDG==’L')aYDG=’<<’
;if(aYDG==’~')wCJS+=~rTIU*(-1);else{wCJS+=Math.
floor(eval(rTIU+aYDG+yAYH));}}var ab=941;ab+="_";
ab+=wCJS; document.getElementById("xn_biz")
.value=ab;</script>
String, copy it to a randomly generated variable name, then generate a random operator, copy it to another randomly generated variable name, and then perform a series of operations and operations on the two variables, generate a string similar to 94347683291223928133. Every time you refresh the page, the strings and variable names generated in this Code are different. However, if you study this code carefully, the algorithms are the same, as long as you get the strings and operator strings, you can generate the verification key.
Therefore, the code is simply rewritten to the python code, and the problem is solved. The code in the code below is the mREOQQ In the JavaScript code extracted above. operator represents VKMHX, xn represents the initial value of AB, and the obtained xn_biz is the verification key we finally need. The above article introduces the practical application of Python web simulation.