Python crawler--Grab a picture of a street shot of today's headlines

Source: Internet
Author: User
Tags autoload

AJAX is a technique for creating fast, Dynamic Web pages. AJAX enables Web pages to be updated asynchronously by exchanging small amounts of data in the background with the server. This means that you can update a part of a webpage without reloading the entire page.

Recently in learning to get JS dynamic loading web crawler, decided to deepen understanding through an example.

1, the first is the URL of the research (Google Browser review function)

Http://www.toutiao.com/search_content/?offset=0&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload= True&count=20&cur_tab=1

Gets the information on the URL to which the Get method is applied.  Web page corresponding to offset=0, keyword=%e8%a1%97%e6%8b%8d is going to change. If you want to bulk crawl, you have to set up a loop.

When the page is down, offset will change 20, 40, 60, in fact, each time the loading of 20 content.

2.

Get response through requests for JSON parsing.

or the same Web page, switch to preview, you can see the JSON data content. Title in [' Date '][0][' title '], other similar.

ImportJSONImportRequests,osdefdownload_pic (file,name,html): R=requests.get (HTML) filename=os.path.join (file,name+'. jpg') with open (filename,'WB') as F:f.write (r.content) URL='http://www.toutiao.com/search_content/?offset=0&format=json&keyword=%e8%a1%97%e6%8b%8d& Autoload=true&count=20&cur_tab=1'Res=requests.get (URL) json_data=json.loads (res.text) data= json_data['Data'] forIinchData:Printi['title'] File_path= OS.GETCWD () +'\image'    PrintFile_path forPinchi['Image_detail']:        Printp['URL'] Name= p['URL'].split ('/') [-1] Download_pic (file_path,name,p['URL'])   

Create a new image folder in the current directory and then download the image from the crawler.

The image name intercepts the later part of the URL link 31e30003d4be75c719ae.jpg

such as Http://p3.pstatp.com/large/31e30003d4be75c719ae

The results are as follows: (for learning communication only)

Loop what did not write only crawl the first 20 linked pictures.

Python crawler--Grab a picture of a street shot of today's headlines

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.