=============== crawler principle ==================
Access the website via python, get the HTML code of the website, and get the image address of SRC in the specific IMG tag via regular expression.
Then access the image address and save the picture locally via IO.
=============== script code ==================
ImportUrllib.request#Network access ModuleImportRandom#random number Generation moduleImportRe#Regular Expression ModuleImportOs#directory structure processing module#Initializing configuration ParametersNumber = 10#number of pictures collectedPath ='img/' #Picture Storage Directory#file Operationsif notos.path.exists (path): os.makedirs (path)#Picture Savedefsave_img (URL, path): Message=NoneTry: File= Open (path + os.path.basename (URL),'WB') Request=urllib.request.urlopen (URL) file.write (Request.read ())exceptException as E:message=Str (e)Else: Message=os.path.basename (URL)finally: if notfile.closed:file.close ()returnmessage#Network ConnectionHTTP ='http://zerospace.asika.tw/photo/' #Destination URLPosition = 290 + int ((1000-number) *random.random ()) IDs= range (position, Position +Number ) forIdinchIDs:Try: URL="%s%d.html"% (HTTP, id)#suffix GenerationRequest =urllib.request.urlopen (URL)exceptException as E:Print(e)Continue Else: Buffer=request.read () buffer= Buffer.decode ('UTF8') Pattern='class= "content-img". +\s+.+src= "(. +\.jpg)"'Imgurl= Re.findall (pattern, buffer)#Filter Rules ifLen (imgurl)! =0:Print(Save_img (imgurl[0], path))Else: Continue Pass
=============== Running Results ==================
Python web crawler (Image capture script)