A deep dive into python -- a small Crawler
Here is an example of crawling images from Baidu Post Bar.
Package
Urllib, urllib2, re
Function
Page = urllib. urlopen ('HTTP: //... ') returns a webpage object.
Html = page. read () reads the html source code of a webpage.
Urllib. urlretrieve () Downloads resources to the local device.
Code
# Coding: utf8import reimport urllibdef getHtml (url): page = urllib. urlopen (url) html = page. read () return htmldef getImgUrl (html): reg = r'src = "(. *? \. Jpg )"'#? Non-Greedy match, () grouping only returns the grouping result, double quotation marks inside the single quotation mark imgre = re. compile (reg) # compile regular expressions to speed up imglist = re. findall (imgre, html) return imglisturl = "http://tieba.baidu.com/p/3162606526" # post it url html = getHtml (url) ImgList = getImgUrl (html) imgList = ImgList [] # print ImgListx = 0for imgurl in ImgList: urllib.urlretrieve(imgurl,'{s.jpg '% x) x + = 1