The urllib2 module and the regular expression module are used. The following code is directly used:
[/Code]
#! /Usr/bin/env python
#-*-Coding: UTF-8 -*-
# Download network content through the urllib (2) Module
Import urllib, urllib2, gevent
# Introduce the regular expression module and time module
Import re, time
From gevent import monkey
Monkey. patch_all ()
Def geturllist (url ):
Url_list = []
Print url
S = urllib2.urlopen (url)
Text = s. read ()
# Regular match: match the image
Html = re. search (R' <ol. * </ol> ', text, re. S)
Urls = re. finditer (R' <p> </p> ', html. group (), re. I)
For I in urls:
Url = I. group (1). strip () + str ("jpg ")
Url_list.append (url)
Return url_list
Def download (down_url ):
Name = str (time. time () [:-3] + "_" + re. sub ('. +? /', '', Down_url)
Print name
Urllib. urlretrieve (down_url, "D: \ TEMP \" + name)
Def getpageurl ():
Page_list = []
# List page Loop
For page in range (1,700 ):
Url = "http://jandan.net/ooxx/page-" + str (page) + "# comments"
# Add the generated url to page_list
Page_list.append (url)
Print page_list
Return page_list
If _ name _ = '_ main __':
Jobs = []
Pageurl = getpageurl () [:-1]
# Download images
For I in pageurl:
For (downurl) in geturllist (I ):
Jobs. append (gevent. spawn (download, downurl ))
Gevent. joinall (jobs)
[/Code]
The program is not long enough for 45 lines. It is not too difficult. You can study it. Here I just give a reference. You can develop other grabbing programs based on the principle... I will not talk about it more ~~