First PO Code
#coding =utf-8import urllib.request #3之前的版本直接用urllib即可, the same as # The module provides an interface for reading data from a Web page, allowing us to read the data on WWW or FTP as if it were a local file import reimport osdef gethtml (URL): page = Urllib.request.urlopen ( URL); html = Page.read (); return html;def getimg (HTML): Imglist = Re.findall (' img src= ' (http.*?) ', HTML) #1 #http. *? indicates a non-greedy pattern match, as long as HTTP is matched to completion, No longer look at the back of the match, that is, in order to make the entire match success, the use of a minimum of repeated return imglisthtml = gethtml ("https://www.zhihu.com/question/39731953"). Decode ("Utf-8"); Imagesurl = getimg (HTML); if os.path.exists ("d:/imags") = = False:os.mkdir ("D:/imags"); Count = 0; #文件的起始名称为 0 for URL in imagesurl:print (URL) if (Url.find ('. ')! =-1): #2 name = Url[url.find ('. ', Len (URL)-5) :]; bytes = Urllib.request.urlopen (URL); f = open ("d:/imags/" +str (count) +name, ' WB '); #代开一个文件, prepare to write to the file F.write (Bytes.read ()) in binary, #write并不是直接将数据写入文件, but write to the memory-specific buffer F.flush (), #将缓冲区的数据立即写入缓冲区, and empty the buffer F.close (); #关闭文件 count+=1;
Code Analysis:
1.re.findall syntax: FindAll (parttern,string,flags=0)
Meaning: Returns all strings in a string that match Partten, and returns the form array
2.find () Syntax: Find (Str,pos_start,pos_end)
Meaning: In the URL to find the location of the STR string, Pos_start refers to the location from which to start, the default value is 0, the default location of the lookup, the default value is-1, if not found in the URL str, then return-1
Python implements simple picture crawlers and saves