Today will scrapy installation success, test the next, Daoteng for a long time, only Daoteng success, hereby share.
In fact, the best teacher is Scrapy's help document, as long as the document to understand, according to do, it will be a moment!
Help document download See http://download.csdn.net/detail/flyinghorse_2012/9566467
0. Create a new folder to store the relevant files, named Test
1. Build Scrapy Project
To run the command:
Scrapy Startproject Tutorial
The effect is as follows:
2. Build Spider
Run the following command:
Scrapy Genspider DMOZ dmoz.org
Format Requirements Description: Scrapy genspider spidername Spiderwebsite
Spidername must be the only, spiderwebsite can be casually formulated, corresponding to dmoz.py in the allowed_domains.
The effect is as follows:
3. Modify items.py
Find .... test\tutorial\tutorial\items.py, modify the contents of the file as:
Import Scrapy
Class Tutorialitem (Scrapy. Item):
title = Scrapy. Field ()
link = scrapy. Field ()
desc = scrapy. Field ()
Save.
4. Modify dmoz.py
Find .... \test\tutorial\tutorial\spiders\dmoz.py, modify the contents of the file as:
#-*-Coding:utf-8-*-
Import Scrapy
Class Dmozspider (Scrapy. Spider):
Name = "DMOZ"
Allowed_domains = ["dmoz.org"]
Start_urls = (
"Http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"Http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
)
Def parse (self, Response):
filename = Response.url.split ("/") [-2] + '. html '
with open (filename, ' WB ') as F:
F.write (Response.body)
Save.
5. Running Crawlers
Scrapy Crawl DMOZ
Format Requirements Description: Scrapy Crawl Spidername
Spidername is the spidername in the Step2.
The effect is as follows:
2 HTML files have been successfully generated and the content of the Web page has been crawled.
scrapy1.1 Introductory use case introduction