Notes for blackboards and crawlers (4-5)

Source: Internet
Author: User

Notes for blackboards and crawlers (4-5)

The fourth mark adds the following two points to the third mark:

1. The webpage response time increases. (Multithreading is required to quickly find the password)

2. strong passwords. 100-bit passwords are randomly displayed by location. You need to capture passwords at different locations on the webpage and then combine them.

 

Problem solving process:

The first time (failed): I found 13 pages in the password list, so I thought that I only needed to add the value of a non-repeated position in the list to a list. Then convert the list to an int password, that is, the logon password. After multiple attempts and a logon failure, I found that the password on the 13-page cannot exceed 70 characters. At this time, I did not expect the password to be 100 bits...

Second (SUCCESS): I saw someone succeeded in the discussion board. So I learned from my experience and saw that the password was 100 characters in length, So I thought about how to pass the test. You just need to change the code for the first time, add the password length to judge, and try to log on when it reaches 100 bits.

Successful (630 seconds )::

Fourth off the code: http://www.cnblogs.com/hxs2660/p/5559611.html

 

Level 5: Verification Code Recognition

1: username, 2: csrfmiddlewaretoken, 3: password, 4: captcha_0, 5: captcha_1

We all know Items 1 and 2. Password: the instructor did not provide a password prompt. The password can only be set to 0 through 2nd or 3 ~ 30. Captcha_0: indicates a hidden value on the logon page (which can be obtained). The value of the verification code on the server is used. Captcha_1: verification code.

1.2 Verification code recognition refers to section 11.3 of the python network data collection book: Reading verification codes and training Tesseract. After testing, the Tesseract recognition rate is around 16% (because I don't have a training sample). We can verify the identified verification code to see if it can be used for logon, because the verification code is regular: the length is 4 and all are uppercase letters. Therefore, if you log on, you will have a 50% probability that the verification code is correct.

-- Tesseract: Tesseract is an OCR library, refer to: https://github.com/tesseract-ocr/tesseract/wiki

1.3 The following is the program flowchart:

2 Success (Time: 640 seconds ...):

 

5th off the Code a little long will not copy over, interested can see the https://github.com/hxs2660/hbk_crawler/blob/master/ex05.py

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.