How to use the Scrapy shell to verify the results of XPath selection in detail tutorial

Source: Internet
Author: User
Tags xpath
1. Scrapy Shell

is a good interactive tool for the Scrapy package, and I'm currently using it primarily to validate the results of XPath selections. Once the scrapy is installed, it is possible to operate the scrapy shell directly on CMD.

Scrapy Shell

The Scrapy terminal is an interactive terminal that allows us to try and debug the code without starting the spider, or to test XPath or CSS expressions to see how they work and to facilitate the data extracted from the pages we crawl.

If IPython is installed, the Scrapy terminal will use IPython (instead of the standard Python terminal). The IPython terminal is more powerful than others, providing intelligent auto-completion, highlighting output, and other features. (Recommended installation Ipython)

Start Scrapy Shell

Enter the project's root directory and execute the following command to start the shell:

Scrapy Shell "http://www.itcast.cn/channel/teacher.shtml"

The Scrapy shell automatically creates some easy-to-use objects, such as Response objects, and Selector objects (for HTML and XML content), based on the downloaded page.

When the shell is loaded, a local response variable containing the response data will be obtained, and the input response.body will output the response package body, and the output response.headers can see the response header.

When you enter Response.selector, you get an object selector The class response initialized, which can be done by using Response.selector.xpath () or RESPONSE.SELECTOR.CSS () To query the response.

Scrapy also provides some shortcuts, such as Response.xpath () or RESPONSE.CSS (), which can also take effect (as in previous cases).

Selectors Selector

Scrapy selectors built-in XPath and CSS Selector expression mechanism

Selector has four basic methods, the most common of which is XPath:

XPath (): An XPath expression that returns the selector list of all nodes corresponding to the expression

Extract (): Serializes the node to a Unicode string and returns a list

CSS (): An incoming CSS expression that returns the selector list of all nodes corresponding to the expression, with the syntax of BEAUTIFULSOUP4

Re (): Extracts the data based on the incoming regular expression, returning the Unicode string list


2. Ipython

It is recommended to use Ipython to run scrapy Shell on the official website, so I try to install it. Because my python environment was previously configured via Conda (see previous article), it is convenient to install Ipython via Conda

Conda install-c Conda-forge Ipython

Then the entire Ipython package will be downloaded, because it is compiled well, without the annoying process of compiling failure.

3. Run the Ipython and run the Scrapy shell on the Ipython

In the current CMD run box, because the system environment has been configured, you can directly run the Python package, so directly in the CMD Run box Ipython will go into the ipython of the Run box, and the system standard CMD almost, but more rich, richer color, layout can also be good.

But when I hit the scrapy shell command directly on this, but always said no this command, failed. Got stuck in here.

Later by reading carefully the instructions of the scrapy shell

If you have IPython installed, the scrapy shell would use it (instead of the standard Python console).

It means that the scrapy shell will look for the Ipython's running box on its own.

The Scrapy Shell <url> is entered directly in the CMD standard run box, and the returned result is called directly to the Ipython's Run box.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.