Scrapy knowledge supplement--scrapy shell and Spider

Source: Internet
Author: User
Tags xpath

What is a scrapy shell?

The Scrapy terminal is an interactive terminal that allows us to try and debug the code without starting the spider, or to test XPath or CSS expressions to see how they work and to easily crawl the data in the page.

Selector selector (Scrapy built-in)

Selecctor has four basic methods, the most common of which is XPath:

    • XPath (): An XPath expression that returns the list of selector for all nodes corresponding to the
    • Extract (): Serializes the node to a Unicode string and returns a list
    • CSS (): An incoming CSS expression that returns a list of all the nodes that correspond to the expression, with the syntax BeautifulSoup4
    • Re (): Extracts the data based on the incoming regular expression, returning a list of Unicode strings
What does the spider do? What is a spider?

The spider class defines how to crawl a (certain) Web site. Includes crawl actions (for example, whether to follow links) and how to extract structured data (crawled item) from Ang. In other words, a spider is a defined crawl action and a place to parse a Web page.

Properties and methods of the spider

Main properties and methods:

Name: A string that defines the name of the spider. For example, if the spider crawls website.com, the spider is typically named website

Allowed_domains: Contains a list of domain names (domains) that are allowed for crawling, optional.

Start_url: The Ganso or List of the initial URL. When a specific URL is not given, the spider starts crawling from the list.

Start_requests (self): The method returns an iterative object (iterable). The object contains the first request of the spider for crawling (the default implementation is the URL in Start_url).

Parse (self, Response): The default Request object callback function when requesting a URL to return a page without specifying a callback function. Used to handle the response returned by the Web page, as well as the generated item or request object.

How do I write spider crawl data?

See:

Scrapy knowledge supplement--scrapy shell and Spider

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.