Scrapy Crawl Ajax-requested web pages

Source: Internet
Author: User

In the previous blog http://zhouxi2010.iteye.com/blog/1450177

The introduction of the use of Scrapy crawl Web pages, but only to crawl the normal HTML link, the AJAX request for the Web page is not caught, but the actual application of the AJAX request is very common, so here in the record crawl Ajax page method.

is still spiders/book.py:

Java Code    class bookspider (crawlspider):          ................       ................        ................       def parse_item (self, response):              hxs = htmlxpathselector (response)               item = bookitem ()               ........            ..........            #这里url是虚构的, need to be modified when used            url =   "http://test_url/ Callback.php?ajax=true "           request =   Request (url, callback=self.parse_aJAX)            request.meta[' item '] = item             #这里将ajax的url找出来, then enough to find the request, the framework execution request received back and then callback             yield request           Def parse_ajax (self, response):           data  = response.body           # Write a regular match here or select Xpathselector to capture the data that you want, slightly            ajaxdata =  get_data (data)                #由于返回可能是js, Python can be used to simulate the JS interpreter, but here is lazy to use JSON for conversion            if  ajaxdata:               x =  ' {' Data ": " '  + ajaxdata.replace ('\ n ',  ')  +  ' "} '                 ajaxdata = simplejson.loads (x) [' Data ']            else:               ajaxdata =   '                       item = response.meta[' Item ']            item[' Ajaxdata '] = ajaxdata           for key  in item:               if  isinstance (item[key], unicode):                    item[key] = item[key].encode (' UTF8 ')              #到这里一个Item的全部元素都抓齐了, so return item for save             return item  

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.