Scrapy a fast, advanced screen crawl and Web Capture framework
http://scrapy.org/official website
https://docs.scrapy.org/en/latest/Documentation
Installation: Win7 installation scrapy:2017-10-19
Current environment: win7,python3.6.0,pycharm4.5. The Python directory is: c:/python3/
Scrapy rely on more cubby, at least rely on the library has twisted 14.0,lxml 3.4,pyopenssl 0.14.
Reference article: http://www.cnblogs.com/liuliliuli2017/p/6746440.html Python3 Environment Installation Scrapy crawler framework process and common errors
I was having trouble installing Twisted. The steps to resolve are as follows:
1, http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted (IMPORTANT: This site has a lot of WHL files! ) to download here. WHL file
Supposedly my machine is win764 bit, should be used TWISTED-17.9.0-CP36-CP36M-WIN_AMD64.WHL, but prompted not to let the installation. Had to blind cat hit Dead mouse like, and downloaded TWISTED-17.9.0-CP36-CP36M-WIN32.WHL this file. Put it in the C:\PYTHON3\SCRIPTS\TWISTED-17.9.0-CP36-CP36M-WIN32.WHL.
Run: Python pip3.exe install TWISTED-17.9.0-CP36-CP36M-WIN32.WHL
Then run: Python pip.exe install scrapy, it's loaded.
In learning:
CD c:\Python3\zz\ # C:\Python3\zz\ , is the folder where I put the project python c:/python3/scripts/scrapy.exe start Project Plant # build a crawler project called Plant
C:\Python3\zz\plant\
├SCRAPY.CFG: Configuration file for Project
├plant/: The project's Python module. You will then join the code here.
├plant/items.py: Item file in the project.
├plant/pipelines.py: The pipelines file in the project.
├plant/settings.py: The setup file for the project.
└plant/spiders/: The directory where the spider code is placed.
Edit items.py
Import scrapy class Dmozitem (scrapy. Item): = scrapy. Field () = scrapy. Field () = scrapy. Field ()
Write the first crawler (Spider), create a file C:\Python3\zz\plant\plant\spiders\quotes_spider.py
The following two steps, is to see the tutorial: Https://doc.scrapy.org/en/latest/intro/tutorial.html#creating-a-project, but this machine error, try again tomorrow
ImportscrapyclassQuotesspider (scrapy. Spider): Name="Quotes" defstart_requests (self): URLs= [ 'http://quotes.toscrape.com/page/1/', 'http://quotes.toscrape.com/page/2/', ] forUrlinchURLs:yieldScrapy. Request (Url=url, callback=self.parse)defParse (self, Response): page= Response.url.split ("/") [-2] FileName='quotes-%s.html'%page with open (filename,'WB') as F:f.write (response.body) Self.log ('Saved file%s'% filename)
Go to the project folder and run:
CD C:\Python3\zz\plantscrapy Crawl Quotes
....
Python library: scrapy (pit not filled)