"Python Mini Practice" 0013

Source: Internet
Author: User

question No. 0013: use Python to write a crawl picture of the program, crawl this link in the Japanese sister pictures:-)

Koko... Sister even, the big night to climb something to eat it. Food atlas: SIP, Lick, Twist ~ Scd

There are a lot of ways to write a simple crawling crawler.

This attempt was made with urlib.request .

Read the image network source code, using Re.compile to find the required IMG Tag generated image list, and finally use Request.urlretrieve to download pictures to local.

Code:

ImportOSImportReImporturllib.requestdefpic_collector (URL): Content=urllib.request.urlopen (URL). Read () R= Re.compile (' "height=" "src=" (. *) "') Pic_list= R.findall (Content.decode ('Utf-8')) Os.mkdir ('pic_collection') Os.chdir (Os.path.join (OS.GETCWD (),'pic_collection'))     forIinchRange (len (pic_list)): Pic_num= str (i) +'. jpg'Urllib.request.urlretrieve (Pic_list[i], pic_num)Print("success!"+Pic_list[i]) pic_collector ("http://tieba.baidu.com/p/4341640851")

Note:

1. Re.compile () content is determined by the source code of the Web page . For example, I picked this page, using Chrome to view the source code, to find the image that you want to download tags, the full content of the following (in a picture as an example):

 class  ="  bde_image   pic_type="  1   width="  450   height="  450   src="  http://imgsrc.baidu.com/forum/w%3d580/sign=a6080fca870a19d8cb03840d03fb82c9/ 2683ea039245d688be88e4dfa3c27d1ed31b2445.jpg   Size= " 259380  " ; 

That is, the content of the picture tag is "

2. R.findall () in the content after the decode (' Utf-8 ') to be able to understand the Utf-8 format page source code

3. os.mkdir (filename) new folder; Os.chdir (filename) Change path to XX folder; OS.GETCWD () Gets the current folder name (string)

4. Urllib.request.urlretrieve (pic,pic_name) Save the image to the above path and set the file name

The saved files are as follows:

In the future to see the United States teenager soap Flakes no longer have unlimited right button, I heart very comforting _ (: 3"∠) _

Oh, if in the bar, heap sugar want to download xx page to xx page pictures how to do,.???.

For example, the above image sticker, the URL is jiangzi:

Http://tieba.baidu.com/p/4341640851?pn=1  #第1页http://tieba.baidu.com/p/4341640851?pn=2  # 2nd page http://tieba.baidu.com/p/4341640851?pn=3  http://tieba.baidu.com/p/4341640851?pn=4  #第4页 ... http://tieba.baidu.com/p/4341640851?pn=n  #第n页

Then change the code:

Importurllib.requestImportReImportOSdeffetch_pictures (URL, m, n): Os.chdir (Os.path.join (OS.GETCWD (),'pic_collection')) Temp= 1#record number of pictures     forXinchRange (n-m+1): Html_content= Urllib.request.urlopen (URL +"? pn="+ str (N+X-1)). Read ()#key!R = Re.compile (' "height=" "src=" (. *) "') Picture_url_list= R.findall (Html_content.decode ('Utf-8'))        Print(picture_url_list) forIinchRange (len (picture_url_list)): Picture_name= str (temp) +'. jpg'Urllib.request.urlretrieve (Picture_url_list[i], picture_name)Print("success!"+picture_url_list[i]) Temp+ = 1Fetch_pictures ("http://tieba.baidu.com/p/4341640851", 1, 3)

So you can download the picture on page 1th to 3rd, down the entire post of the picture to see the number of pages to change their own.

"Python Mini Practice" 0013

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.