This practice crawls the portal this bar in the food picture.
If the IMG tag and the class attribute are used, the beautifulsoup can be easily solved, but this time use regular expression, I also refer to the blogger's blog: Portal
The SRC address of all pictures is the same, so we can filter out the images we want. That is, instead of the value of the class attribute in the match, the regular expression is used to match the SRC value.
1 fromUrllibImportRequest2 fromBs4ImportBeautifulSoup3 ImportRe4 5 defget_page (URL, tot_page):6Url_list = []7 forIinchRange (1, tot_page):8New_url = Re.sub (('=(.*)'),'%s%s'%('=', i), url)9 url_list.append (New_url)Ten returnurl_list One A - if __name__=='__main__': -URL ='http://tieba.baidu.com/p/4792769205?pn=1' thePath ='D:\python\project\ Crawler Results' -Count =0 -url_list = Get_page (URL, 4) - forUrlinchurl_list: + Print(URL) -page =request.urlopen (URL). read (). Decode () +Soup = beautifulsoup (page,'lxml') ARegex = Re.compile ("http://imgsrc.baidu.com/forum/w%3d580/sign=.+\.jpg") atPic_list = Soup.findall ('img', {'src': Regex}) - forPicinchpic_list: -Pic = pic['src'] -Request.urlretrieve (pic,'%s/%s.jpg'%(path, count)) -Count + = 1
I climbed the 3-page Picture:
Python crawler Training--Regular expression +beautifulsoup crawling pictures