Python implements simple crawler functions

Source: Internet
Author: User

1 #First we can get the entire page information to download the picture2 #Coding=utf-83 #The urllib module provides an interface for reading Web page data, and we can read the data on WWW and FTP as if it were a local file .4 ImportUrllib5 ImportRe6 #First, we define a gethtml () function:7 defgethtml (URL):8 #Urllib.urlopen () method to open a URL address9page =urllib.urlopen (URL)Ten #The Read () method is used to read the data on the URL, pass a URL to the gethtml () function, and download the entire page. The execution program will print out the entire page.  OneHTML =Page.read () A     returnHTML -  - #created the getimg () function to filter the desired picture connection in the entire page obtained the defgetimg (HTML): - #use regular expressions to remove the URL of a picture from a page -Reg = R'src= "(. +?\.jpg)" Pic_ext' - #Re.compile () can compile regular expressions into a regular expression object +Imgre =Re.compile (REG) - #the Re.findall () method reads data in HTML that contains Imgre (regular expressions) +Imglist =Re.findall (imgre,html) A #The acquired picture connection is traversed through a for loop, in order to make the picture's file name look more canonical and rename it, and the naming convention is added 1 by the x variable atx =0 -      forImgurlinchimglist: - #Urllib.urlretrieve () method to download remote data directly to a local -Urllib.urlretrieve (Imgurl,'%s.jpg'%x) -X+=1 - #The URLs we want to crawl may not be the same for each URL, so different regular expressions are required. inhtml = gethtml ("http://tieba.baidu.com/p/2460150866") - Printgetimg (HTML) to      +Http://www.cnblogs.com/fnng/p/3576154.html#Top

Update

If you want to download it in the file you specify, you only need to modify the

Urllib.urlretrieve (Imgurl, '%s.jpg '% x)

Can
Urlretrieve (Url,path)
Change path to what you want, as I want to put in F:\pic.

Urllib.urlretrieve (Imgurl, ' f:/pic/%s.jpg '% x)
You can have it.

Python implements simple crawler functions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.