Python implements simple picture crawlers and saves

Source: Internet
Author: User

First PO Code

 #coding =utf-8import urllib.request #3之前的版本直接用urllib即可, the same as # The module provides an interface for reading data from a Web page, allowing us to read the data on WWW or FTP as if it were a local file import reimport osdef gethtml (URL): page = Urllib.request.urlopen (    URL);    html = Page.read (); return html;def getimg (HTML): Imglist = Re.findall (' img src= ' (http.*?) ', HTML) #1 #http. *? indicates a non-greedy pattern match, as long as HTTP is matched to completion, No longer look at the back of the match, that is, in order to make the entire match success, the use of a minimum of repeated return imglisthtml = gethtml ("https://www.zhihu.com/question/39731953"). Decode    ("Utf-8"); Imagesurl = getimg (HTML); if os.path.exists ("d:/imags") = = False:os.mkdir ("D:/imags"); Count = 0; #文件的起始名称为 0 for URL in imagesurl:print (URL) if (Url.find ('. ')! =-1): #2 name = Url[url.find ('. ', Len (URL)-5)        :];        bytes = Urllib.request.urlopen (URL);  f = open ("d:/imags/" +str (count) +name, ' WB ');        #代开一个文件, prepare to write to the file F.write (Bytes.read ()) in binary, #write并不是直接将数据写入文件, but write to the memory-specific buffer F.flush (), #将缓冲区的数据立即写入缓冲区, and empty the buffer F.close (); #关闭文件 count+=1; 

Code Analysis:

1.re.findall syntax: FindAll (parttern,string,flags=0)

Meaning: Returns all strings in a string that match Partten, and returns the form array

2.find () Syntax: Find (Str,pos_start,pos_end)

Meaning: In the URL to find the location of the STR string, Pos_start refers to the location from which to start, the default value is 0, the default location of the lookup, the default value is-1, if not found in the URL str, then return-1

Python implements simple picture crawlers and saves

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.