Notes for blackboards and crawlers (4-5)
The fourth mark adds the following two points to the third mark:
1. The webpage response time increases. (Multithreading is required to quickly find the password)
2. strong passwords. 100-bit passwords are randomly displayed by location. You need to capture passwords at different locations on the webpage and then combine them.
Problem solving process:
The first time (failed): I found 13 pages in the password list, so I thought that I only needed to add the value of a non-repeated position in the list to a list. Then convert the list to an int password, that is, the logon password. After multiple attempts and a logon failure, I found that the password on the 13-page cannot exceed 70 characters. At this time, I did not expect the password to be 100 bits...
Second (SUCCESS): I saw someone succeeded in the discussion board. So I learned from my experience and saw that the password was 100 characters in length, So I thought about how to pass the test. You just need to change the code for the first time, add the password length to judge, and try to log on when it reaches 100 bits.
Successful (630 seconds )::
Fourth off the code: http://www.cnblogs.com/hxs2660/p/5559611.html
Level 5: Verification Code Recognition
1: username, 2: csrfmiddlewaretoken, 3: password, 4: captcha_0, 5: captcha_1
We all know Items 1 and 2. Password: the instructor did not provide a password prompt. The password can only be set to 0 through 2nd or 3 ~ 30. Captcha_0: indicates a hidden value on the logon page (which can be obtained). The value of the verification code on the server is used. Captcha_1: verification code.
1.2 Verification code recognition refers to section 11.3 of the python network data collection book: Reading verification codes and training Tesseract. After testing, the Tesseract recognition rate is around 16% (because I don't have a training sample). We can verify the identified verification code to see if it can be used for logon, because the verification code is regular: the length is 4 and all are uppercase letters. Therefore, if you log on, you will have a 50% probability that the verification code is correct.
-- Tesseract: Tesseract is an OCR library, refer to: https://github.com/tesseract-ocr/tesseract/wiki
1.3 The following is the program flowchart:
2 Success (Time: 640 seconds ...):
5th off the Code a little long will not copy over, interested can see the https://github.com/hxs2660/hbk_crawler/blob/master/ex05.py