As a powerful scripting language, Python is often used to write crawler programs. The following describes how to use Python to capture images through multiple threads through a proxy. this is a simple python multi-thread crawler. I. function description:
1. capture the proxy server in multiple threads and verify the proxy server in multiple threads
The ps proxy server is crawled from the http://www.cnproxy.com/(the test selects only 8 pages)
2. capture the image address of a website and use multiple threads to randomly download images from a proxy server.
II. implementation code
The code is as follows:
#! /Usr/bin/env python
# Coding: UTF-8
Import urllib2
Import re
Import threading
Import time
Import random
RawProxyList = []
CheckedProxyList = []
Imgurl_list = []
# Crawling proxy websites
Portdicts = {'V': "3", 'm': "4", 'A': "2", 'L': "9", 'Q ': "0", 'B': "5", 'I': "7", 'w': "6", 'r': "8", 'C ': "1 "}
Targets = []
For I in xrange (1, 9 ):
Target = r "http://www.cnproxy.com/proxy%d.html" % I
Targets. append (target)
# Print targets
# Capture proxy server regular expressions
P = re. compile (r '''(. + ?)