ImportReImporturllib.request#------ways to get Web page source code---defgethtml (URL): page=urllib.request.urlopen (URL) HTML=Page.read ()returnHTML#Enter the URL of any post------gethtml ()------html = gethtml ("https://tieba.baidu.com/p/5352556650")#------Modify the character encoding within the HTML object to UTF-8------html = Html.decode ('UTF-8')#------How to get all the picture addresses in a post------defgetimg (HTML):#------Use regular expressions to match Web page content to find a picture address------Reg = R'src= "([. *\s]*\.jpg)"'Imgre=Re.compile (REG); Imglist=Re.findall (imgre, HTML)returnimglistimglist=getimg (HTML) imgname=0 forImgpathinchimglist:#------It is best to use exception handling and multithreaded programming------ Try: F= Open ('d:\\temp\\'+ STR (imgname) +". jpg",'WB') F.write ((Urllib.request.urlopen (Imgpath)). Read ())Print(Imgpath) f.close ()exceptException as E:Print(imgpath+"Error") Imgname+ = 1Print("All done!")
"Python" python3 implement web crawler download image