Python uses scrapy to crawl site sitemap information _python

Source: Internet
Author: User

This article illustrates how Python uses scrapy to crawl site sitemap information. Share to everyone for your reference. Specifically as follows:

Import re from
scrapy.spider import basespider from
scrapy import log to
Scrapy.utils.response import body _or_str from
scrapy.http import Request from
scrapy.selector import Htmlxpathselector
class Sitemapspider (basespider):
 name = "Sitemapspider"
 start_urls = ["Http://www.domain.com/sitemap.xml"]
 def parse (self, Response):
  nodename = ' loc '
  text = body_or_str (response)
  R = Re.compile (r) (<%S[\S>]) (. *?) (</%s>) "% (Nodename,nodename), re. Dotall)
  for match in R.finditer (text):
   URL = match.group (2)
   yield Request (URL, callback=self.parse_ Page
 def parse_page (self, Response):
    HxS = htmlxpathselector (response)
    #Mock item
  blah = Item () c20/> #Do all your page parsing and selecting to elemtents you want blash.divtext
    = Hxs.select ('//div/text () '). Extrac T () [0]
  yield blah

I hope this article will help you with your Python programming.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.