Here we introduce the installation and use of Python's Distributed crawler framework scrapy. mediocre This thing is like the white shirt on the stain, once infected will never wash off, can not be undone.
Installation and use of Scrapy
My computer environment is win10,64 bit. The Python version is 3.6.3. The following is the first case of installation and learning Scrapy.
First, the installation preparation of Scrapy
Run the following command directly
Pip Install Scrapy
Because Microsoft Visual C + + 14.0 is not installed on my computer. The following error will appear.
' Twisted.test.raiser ' extension error:microsoft Visual Cis"Microsoft Visual C + + Build Tools" : Http://landinghub.visualstudio.com/visual-cpp-build-tools
There are two types of solutions, one of which is to install Microsoft Visual C + + Build Tools. This is bigger, I don't use this way here. You can directly install the twisted version that has been compiled online. The compiled Python library can be found on the https://www.lfd.uci.edu/~gohlke/pythonlibs. We found the twisted library that Scrapy needed. CP36 indicates that Python version 3.6,amd64 represents 64 bits.
After downloading the installation, run the following command to install twisted.
Pip Install D:\360DOWNLOAD\TWISTED-17.9.0-CP36-CP36M-WIN_AMD64.WHL
The final run of Pip install Scrapy can be successfully installed.
The WHL format is essentially a compressed package that contains the PY file and the compiled PYD file. Allows you to choose the appropriate Python environment for your installation without having a compilation environment.
Second, the first case of running Scrapy
Create a Python file quotes_spider.py with the following content
ImportscrapyclassQuotesspider (scrapy. Spider): Name="Quotes"Start_urls= [ 'http://quotes.toscrape.com/tag/humor/', ] defParse (self, response): forQuoteinchRESPONSE.CSS ('Div.quote'): yield { 'text': Quote.css ('Span.text::text'). Extract_first (),'author': Quote.xpath ('Span/small/text ()'). Extract_first (),} next_page= Response.css ('li.next a::attr ("href")'). Extract_first ()ifNext_page is notNone:yieldResponse.follow (Next_page, Self.parse)
Run the command in the appropriate directory
Scrapy Runspider Quotes_spider.py-o Quotes.json
The following error will occur:
Import'win32api'
Need to install WIN32API, address https://sourceforge.net/projects/pywin32/files/pywin32/Build%20221/. Here we choose to install.
After installation, rerun Scrapy Runspider Quotes_spider.py-o Quotes.json to see the successful build Quotes.json file. The contents are as follows
[{"text":"\u201cthe person, being it gentleman or lady, who had not pleasure in a good novel, must be intolerably stupid.\u201d","author":"Jane Austen"},{"text":"\U201CA Day without sunshine are like, you know, night.\u201d","author":"Steve Martin"},{"text":"\u201canyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make Y Ou a car.\u201d","author":"Garrison Keillor"},{"text":"\u201cbeauty is in the eye of the beholder and it may necessary from time to time to give a stupid or misinformed b Eholder a black eye.\u201d","author":"Jim Henson"},{"text":"\u201call you need are love. But a little chocolate now and then doesn ' t hurt.\u201d","author":"Charles M. Schulz"},{"text":"\u201cremember, we ' re madly in love, so it's all right and kiss me anytime you feel like it.\u201d","author":"Suzanne Collins"},{"text":"\u201csome people never go crazy. What truly horrible lives they must lead.\u201d","author":"Charles Bukowski"},{"text":"\u201cthe Trouble with a open mind, of course, was that people would insist on coming along and trying to put th Ings in it.\u201d","author":"Terry Pratchett"},{"text":"\u201cthink left and think right and think low and think high. Oh, the thinks can think up if only you try!\u201d","author":"Dr. Seuss."},{"text":"\u201cthe reason I talk to myself are because i\u2019m the only one whose answers I accept.\u201d","author":"George Carlin"},{"text":"\u201ci am free of all prejudice. I hate everyone equally. \u201d","author":"w.c. fields"},{"text":"\u201ca lady ' s imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d< /c0>","author":"Jane Austen"}]
Friendship Link
- Scrapy's Official document: https://docs.scrapy.org/en/latest/
- Compiled python file address: https://www.lfd.uci.edu/~gohlke/pythonlibs
Python crawler---->scrapy use (i)