Notes for Blackboard crawlers (1-3) and 1-3

Source: Internet
Author: User

Notes for Blackboard crawlers (1-3) and 1-3

-- First of all, I would like to thank the instructor for making this crawler series, so that you can learn a lot.

 

Level 1: add the number prompted on the webpage to the end of the website

  Solutions:

1,Find the html Tag corresponding to the number and use the regular expression to match the TAG content.

2. Extract the number and add it to the end of the first-level website to obtain a new number.

Problem solving process:

 

Level 2:

Solution: from 0 ~ 30.

Level 3:Two layers of protection are added on the basis of the second level.

1. Access3.When this parameter is disabled, a logon page is displayed. You must log on first (test account: username: test; password: test123 ).

2. There is a CSRF parameter during logon.

 

Solution:

1. Initiate a GET request to obtain the logon page and save the csrftoken returned by the server.

2. Try to log on with username: test; password: test123; csrftoken. After successful login, save the new csrftoken returned by the server.

3. Try username: test; password :( 0 ~ 30); csrftoken logon.

Problem solving process:

 

 

 

 

#! /Usr/bin/python # codeing: UTF-8 # Be hxsimport reimport timefrom threading import Threadtry: import requestsexcept ImportError: print "import requests error" exit (0) def print_run_time (func ): "decorator function, output RunTime" def wrapper (self, * args, ** kw): local_time = time. time () # print args), kw func (self) print 'run time is {:. 2f }:'. format (time. time ()-local_time) return wrapper
Class hbk_crawler (object): "" "def _ init _ (self): pass # super (hbk_exxx, self ). _ init _ () def login (self, level): "Logon function input: level" self. url = 'HTTP: // www.heibanke.com/lesson/crawler_ex' + level self. login_url = 'HTTP: // www.heibanke.com/accounts/login /? Next =/lesson/crawler_ex '+ level self. s = requests. session () print u "logging on to the {} mark .... ". format (int (level) + 1) try: self. csrftoken = self. s. get (self. login_url ). cookie ['csrftoken'] failed T: print u "network connection error. Please try again... "exit () self. payload = {'username': 'test', 'Password': 'test123', 'csrfmiddlewaretoken ': self. csrftoken} self. payload ['csrfmiddlewaretoken'] = self. s. post (self. login_url, self. payload ). cookies ['csrftoken'] print u "are successfully logged on .... "return None @ print_run_time def ex01 (self, * args, ** kw):" "1st off: Find password" url = 'HTTP: // www.heibanke.com/lesson/crawler_ex00/'num = ''while True: content = requests. get (url + str (num )). text pattern = R'

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.