Python crawler---->scrapy use (i)

Source: Internet
Author: User

Here we introduce the installation and use of Python's Distributed crawler framework scrapy. mediocre This thing is like the white shirt on the stain, once infected will never wash off, can not be undone.

Installation and use of Scrapy

My computer environment is win10,64 bit. The Python version is 3.6.3. The following is the first case of installation and learning Scrapy.

First, the installation preparation of Scrapy

Run the following command directly

Pip Install Scrapy

Because Microsoft Visual C + + 14.0 is not installed on my computer. The following error will appear.

' Twisted.test.raiser ' extension    error:microsoft Visual Cis"Microsoft Visual C + + Build Tools" : Http://landinghub.visualstudio.com/visual-cpp-build-tools

There are two types of solutions, one of which is to install Microsoft Visual C + + Build Tools. This is bigger, I don't use this way here. You can directly install the twisted version that has been compiled online. The compiled Python library can be found on the https://www.lfd.uci.edu/~gohlke/pythonlibs. We found the twisted library that Scrapy needed. CP36 indicates that Python version 3.6,amd64 represents 64 bits.

After downloading the installation, run the following command to install twisted.

Pip Install D:\360DOWNLOAD\TWISTED-17.9.0-CP36-CP36M-WIN_AMD64.WHL

The final run of Pip install Scrapy can be successfully installed.

The WHL format is essentially a compressed package that contains the PY file and the compiled PYD file. Allows you to choose the appropriate Python environment for your installation without having a compilation environment.

Second, the first case of running Scrapy

Create a Python file quotes_spider.py with the following content

ImportscrapyclassQuotesspider (scrapy. Spider): Name="Quotes"Start_urls= [        'http://quotes.toscrape.com/tag/humor/',    ]    defParse (self, response): forQuoteinchRESPONSE.CSS ('Div.quote'):            yield {                'text': Quote.css ('Span.text::text'). Extract_first (),'author': Quote.xpath ('Span/small/text ()'). Extract_first (),} next_page= Response.css ('li.next a::attr ("href")'). Extract_first ()ifNext_page is  notNone:yieldResponse.follow (Next_page, Self.parse)

Run the command in the appropriate directory

Scrapy Runspider Quotes_spider.py-o Quotes.json

The following error will occur:

    Import'win32api'

Need to install WIN32API, address https://sourceforge.net/projects/pywin32/files/pywin32/Build%20221/. Here we choose to install.

After installation, rerun Scrapy Runspider Quotes_spider.py-o Quotes.json to see the successful build Quotes.json file. The contents are as follows

[{"text":"\u201cthe person, being it gentleman or lady, who had not pleasure in a good novel, must be intolerably stupid.\u201d","author":"Jane Austen"},{"text":"\U201CA Day without sunshine are like, you know, night.\u201d","author":"Steve Martin"},{"text":"\u201canyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make Y Ou a car.\u201d","author":"Garrison Keillor"},{"text":"\u201cbeauty is in the eye of the beholder and it may necessary from time to time to give a stupid or misinformed b Eholder a black eye.\u201d","author":"Jim Henson"},{"text":"\u201call you need are love. But a little chocolate now and then doesn ' t hurt.\u201d","author":"Charles M. Schulz"},{"text":"\u201cremember, we ' re madly in love, so it's all right and kiss me anytime you feel like it.\u201d","author":"Suzanne Collins"},{"text":"\u201csome people never go crazy. What truly horrible lives they must lead.\u201d","author":"Charles Bukowski"},{"text":"\u201cthe Trouble with a open mind, of course, was that people would insist on coming along and trying to put th Ings in it.\u201d","author":"Terry Pratchett"},{"text":"\u201cthink left and think right and think low and think high. Oh, the thinks can think up if only you try!\u201d","author":"Dr. Seuss."},{"text":"\u201cthe reason I talk to myself are because i\u2019m the only one whose answers I accept.\u201d","author":"George Carlin"},{"text":"\u201ci am free of all prejudice. I hate everyone equally. \u201d","author":"w.c. fields"},{"text":"\u201ca lady ' s imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d< /c0>","author":"Jane Austen"}]

Friendship Link
    • Scrapy's Official document: https://docs.scrapy.org/en/latest/
    • Compiled python file address: https://www.lfd.uci.edu/~gohlke/pythonlibs

Python crawler---->scrapy use (i)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.