Learning Scrapy Notes (vi)-SCRAPY processing JSON API and AJAX pages

Last Update:2016-04-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary: Describes how to use Scrapy to process JSON APIs and AJAX pages

Sometimes, you will find the page you want to crawl does not exist HTML source code, for example, open http://localhost:9312/static/in the browser, then right-click on the space, select "View page source", as follows:

You'll find a blank.

Notice that a file named Api.json is specified in the Red line, so open the network panel in the browser's debugger and find the label named Api.json

In the red box you find the contents of the original page, which is a simple JSON API, some complex API will ask you to log in first, send a POST request, or return some more interesting data structure. Python provides a library for parsing JSON that can transform JSON data into Python objects through statement json.loads (Response.body)

Source code address of the api.py file:

https://github.com/Kylinlin/scrapybook/blob/master/ch05%2Fproperties%2Fproperties%2Fspiders%2Fapi.py

Copy the manual.py file, rename it to api.py, and make the following changes:

Modify the spider name to API
Modify Start_urls to the URL of the JSON API, as follows

Start_urls = ('http://web:9312/properties/api.json',)

If you need to log in before acquiring this JSON API, use the Start_request () function (refer to the Learning Scrapy Note (v)-Scrapy login website)

Modify the Parse function

 def   parse (self, Response): Base_url  =  " http://web:9312/properties/  "  js  = Json.loads (response.body)  for  item in   Js:id = item[ " id   "  "url  = base_url + "  property_%06d.html  " % ID #  Span style= "COLOR: #008000" to build a full URL for each entry  yield  Request (URL, callback= Self.parse_item)

The above JS variable is a list, each element represents an entry and can be verified using the Scrapy Shell tool:

Scrapy Shell Http://web:9312/properties/api.json

Run the Spider:scrapy Crawl API

You can see that a total of 31 request was sent and 30 item was obtained

In the second observation, using the Scrapy shell tool to check the JS variable diagram, in addition to the ID field, you can also get the title field, so you can also get the title field in the Parse function, and transfer the value of the field to Parse_ The item function is populated with item (eliminating the step of using XPath in the Parse_item function to extract the title), and the parse function is modified as follows:

title = item["title"]yield Request (URL, meta={"title"  the#meta variable is a dictionary that is used to pass data to the callback function

In the Parse_item function, you can extract this field from the response

L.add_value ('title', response.meta['title'),        Mapcompose (Unicode.strip, unicode.title))

Learning Scrapy Notes (vi)-SCRAPY processing JSON API and AJAX pages

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learning Scrapy Notes (vi)-SCRAPY processing JSON API and AJAX pages

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support