"Python Mini Practice" 0013

Last Update:2016-04-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

question No. 0013: use Python to write a crawl picture of the program, crawl this link in the Japanese sister pictures:-)

Koko... Sister even, the big night to climb something to eat it. Food atlas: SIP, Lick, Twist ~ Scd

There are a lot of ways to write a simple crawling crawler.

This attempt was made with urlib.request .

Read the image network source code, using Re.compile to find the required IMG Tag generated image list, and finally use Request.urlretrieve to download pictures to local.

Code:

ImportOSImportReImporturllib.requestdefpic_collector (URL): Content=urllib.request.urlopen (URL). Read () R= Re.compile (' "height=" "src=" (. *) "') Pic_list= R.findall (Content.decode ('Utf-8')) Os.mkdir ('pic_collection') Os.chdir (Os.path.join (OS.GETCWD (),'pic_collection'))     forIinchRange (len (pic_list)): Pic_num= str (i) +'. jpg'Urllib.request.urlretrieve (Pic_list[i], pic_num)Print("success!"+Pic_list[i]) pic_collector ("http://tieba.baidu.com/p/4341640851")

Note:

1. Re.compile () content is determined by the source code of the Web page . For example, I picked this page, using Chrome to view the source code, to find the image that you want to download tags, the full content of the following (in a picture as an example):

 class  ="  bde_image   pic_type="  1   width="  450   height="  450   src="  http://imgsrc.baidu.com/forum/w%3d580/sign=a6080fca870a19d8cb03840d03fb82c9/ 2683ea039245d688be88e4dfa3c27d1ed31b2445.jpg   Size= " 259380  " ;

That is, the content of the picture tag is "

2. R.findall () in the content after the decode (' Utf-8 ') to be able to understand the Utf-8 format page source code

3. os.mkdir (filename) new folder; Os.chdir (filename) Change path to XX folder; OS.GETCWD () Gets the current folder name (string)

4. Urllib.request.urlretrieve (pic,pic_name) Save the image to the above path and set the file name

The saved files are as follows:

In the future to see the United States teenager soap Flakes no longer have unlimited right button, I heart very comforting _ (: 3"∠) _

Oh, if in the bar, heap sugar want to download xx page to xx page pictures how to do,.???.

For example, the above image sticker, the URL is jiangzi:

Http://tieba.baidu.com/p/4341640851?pn=1  #第1页http://tieba.baidu.com/p/4341640851?pn=2  # 2nd page http://tieba.baidu.com/p/4341640851?pn=3  http://tieba.baidu.com/p/4341640851?pn=4  #第4页 ... http://tieba.baidu.com/p/4341640851?pn=n  #第n页

Then change the code:

Importurllib.requestImportReImportOSdeffetch_pictures (URL, m, n): Os.chdir (Os.path.join (OS.GETCWD (),'pic_collection')) Temp= 1#record number of pictures     forXinchRange (n-m+1): Html_content= Urllib.request.urlopen (URL +"? pn="+ str (N+X-1)). Read ()#key!R = Re.compile (' "height=" "src=" (. *) "') Picture_url_list= R.findall (Html_content.decode ('Utf-8'))        Print(picture_url_list) forIinchRange (len (picture_url_list)): Picture_name= str (temp) +'. jpg'Urllib.request.urlretrieve (Picture_url_list[i], picture_name)Print("success!"+picture_url_list[i]) Temp+ = 1Fetch_pictures ("http://tieba.baidu.com/p/4341640851", 1, 3)

So you can download the picture on page 1th to 3rd, down the entire post of the picture to see the number of pages to change their own.

"Python Mini Practice" 0013

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Python Mini Practice" 0013

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support