1. Scrapy Shell
is a good interactive tool for the Scrapy package, and I'm currently using it primarily to validate the results of XPath selections. Once the scrapy is installed, it is possible to operate the scrapy shell directly on CMD.
Scrapy Shell
The Scrapy terminal is an interactive terminal that allows us to try and debug the code without starting the spider, or to test XPath or CSS expressions to see how they work and to facilitate the data extracted from the pages we crawl.
If IPython is installed, the Scrapy terminal will use IPython (instead of the standard Python terminal). The IPython terminal is more powerful than others, providing intelligent auto-completion, highlighting output, and other features. (Recommended installation Ipython)
Start Scrapy Shell
Enter the project's root directory and execute the following command to start the shell:
Scrapy Shell "http://www.itcast.cn/channel/teacher.shtml"
The Scrapy shell automatically creates some easy-to-use objects, such as Response objects, and Selector objects (for HTML and XML content), based on the downloaded page.
When the shell is loaded, a local response variable containing the response data will be obtained, and the input response.body will output the response package body, and the output response.headers can see the response header.
When you enter Response.selector, you get an object selector The class response initialized, which can be done by using Response.selector.xpath () or RESPONSE.SELECTOR.CSS () To query the response.
Scrapy also provides some shortcuts, such as Response.xpath () or RESPONSE.CSS (), which can also take effect (as in previous cases).
Selectors Selector
Scrapy selectors built-in XPath and CSS Selector expression mechanism
Selector has four basic methods, the most common of which is XPath:
XPath (): An XPath expression that returns the selector list of all nodes corresponding to the expression
Extract (): Serializes the node to a Unicode string and returns a list
CSS (): An incoming CSS expression that returns the selector list of all nodes corresponding to the expression, with the syntax of BEAUTIFULSOUP4
Re (): Extracts the data based on the incoming regular expression, returning the Unicode string list
2. Ipython
It is recommended to use Ipython to run scrapy Shell on the official website, so I try to install it. Because my python environment was previously configured via Conda (see previous article), it is convenient to install Ipython via Conda
Conda install-c Conda-forge Ipython
Then the entire Ipython package will be downloaded, because it is compiled well, without the annoying process of compiling failure.
3. Run the Ipython and run the Scrapy shell on the Ipython
In the current CMD run box, because the system environment has been configured, you can directly run the Python package, so directly in the CMD Run box Ipython will go into the ipython of the Run box, and the system standard CMD almost, but more rich, richer color, layout can also be good.
But when I hit the scrapy shell command directly on this, but always said no this command, failed. Got stuck in here.
Later by reading carefully the instructions of the scrapy shell
If you have IPython installed, the scrapy shell would use it (instead of the standard Python console).
It means that the scrapy shell will look for the Ipython's running box on its own.
The Scrapy Shell <url> is entered directly in the CMD standard run box, and the returned result is called directly to the Ipython's Run box.