Python: re. sub () second parameter, pythonre. sub
Origin:
The problem is caused by parsing the website kissanime. io.
This video page will have a 5-second delay and submit a form for verification to prevent ddos attacks. The Form Verification is the following html code:
<form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get"> <input type="hidden" name="jschl_vc" value="d5f32a77955a830758982219a37f1124"/> <input type="hidden" name="pass" value="1500949414.776-fpKIjtEKZR"/> <input type="hidden" id="jschl-answer" name="jschl_answer"/> </form>
Evaluate the jschl-answer value and use a piece of js Code. The js Code is randomly generated as follows:
var s,t,o,p,b,r,e,a,k,i,n,g,f, nyqPwxi={"KxtkYgr":+((!+[]+!![]+!![]+[])+(+!![]))}; t = document.createElement('div'); t.innerHTML="<a href='/'>x</a>"; t = t.firstChild.href;r = t.match(/https?:\/\//)[0]; t = t.substr(r.length); t = t.substr(0,t.length-1); a = document.getElementById('jschl-answer'); f = document.getElementById('challenge-form'); ;nyqPwxi.KxtkYgr*=+((!+[]+!![]+[])+(!+[]+!![]+!![]));nyqPwxi.KxtkYgr+=+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));nyqPwxi.KxtkYgr+=+((!+[]+!![]+!![]+[])+(+!![]));nyqPwxi.KxtkYgr*=+((+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]));nyqPwxi.KxtkYgr+=!+[]+!![]+!![];nyqPwxi.KxtkYgr+=+((!+[]+!![]+!![]+[])+(+!![]));a.value = parseInt(nyqPwxi.KxtkYgr, 10) + t.length; '; 121' f.submit();
It is not formatted and messy.
It can be seen that the jschl-answer value is obtained through a series of formula operations. Run js to get its value, and you can simulate form submission and pass verification.
1. Obtain the js Code segment.
We almost use re. Refer to an open-source project and use the following python code to extract JavaScript code:
try: js = re.search(r"setTimeout\(function\(\){\s+(var " "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1) except Exception: raise ValueError("Unable to identify Cloudflare IUAM Javascript on website.") js = re.sub(r"a\.value = (parseInt\(.+?\)).+", r"\1", js) js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js) js = re.sub(r"[\n\\']", "", js) if "parseInt" not in js: raise ValueError("Error parsing Cloudflare IUAM Javascript challenge.") js = js.replace('parseInt', ';return parseInt') js = 'function answer(){%s}' % js
The middle section:
js = re.sub(r"a\.value = (parseInt\(.+?\)).+", r"\1", js)
It was masked! I was not very familiar with the use of re. sub. This method suddenly caught me at the beginning, and I had some gains as I collected it for research and study.
2. re. sub (pattern, repl, string, count = 0, flags = 0)
This is its prototype. There are countless articles on the Internet. For the second parameter, repl, the above usage is omitted.
However, its function is to replace the string matched by pattern with the value of the first group in pattern. Its format is \ number. The number should be in the 1st group starting from 1, and so on, functions are the same as \ g <number>, which is a concise method.
For example:
s = '2017-01-22's = re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2-\3-\1', s)print s # 01-22-2017
R' \ g <0> 'can match the string adapted to pattern, but R' \ 0' cannot. This is the difference found in the test.
If the \ number statement is in the patter, it is to match the Group, as follows:
inputStr = "hello crifan, nihao crifan";replacedStr = re.sub(r"hello (\w+), nihao \1", "crifanli", inputStr);print "replacedStr=", replacedStr; #crifanli
It matches the entire string of inuptStr, which is completely replaced by crifanli.
3. Other statements
In addition to strings, repl can also be a function to perform other operations on matched groups.
Copy an example:
Def replace_digit (m): ss = u'2017, September 5, 1234 'index = int (m. group () return ss [index] s = u'february March 27, 1990 'result = re. sub (U' \ d', replace_digit, s, count = 4) print result # March 27
References:
Http://www.jianshu.com/p/731efbd6029b
Https://docs.python.org/2/library/re.html#re.sub