Install Scrapy error, we choose Anaconda3 as the compilation environment, search Scrapy installation (Error self-examination)
To create a scrapy crawler project:
Bring up cmd to the appropriate directory: enter:
Scrapy Startproject Stockstar
Directory file with spide code spider (used to write crawlers)
Item file items.py in the project (the container used to hold the crawled data, which is stored in a similar way to the Python dictionary)
Middleware middlewares.py for projects (provides a simple mechanism to extend the functionality of Scrapy by allowing custom code to be inserted)
Project pipelines file pipelines.py (core processor)
Setup file for Project settings.py
Configuration file for Project Scrapy.cfg
After creating the project: In the settings file, there is a sentence:
# Obey robots.txt Rules Robotstxt_obey = True
Sometimes we need to turn off: set to False
Right-click the folder and select it in the popup shortcut: Mark Directory as--sources Root, which makes the import package syntax more concise
1. Define an item container:
Written in items.py:
#-*-coding:utf-8-*-#Define Here the models for your scraped items##See documentation in:#https://doc.scrapy.org/en/latest/topics/items.htmlImportscrapy fromScrapy.loaderImportItemloader fromScrapy.loader.processorsImportTakefirstclassStockstaritemloader (itemloader):#custom itemloader for storing field content crawled by crawlersDefault_output_processor =Takefirst ()classStockstaritem (scrapy. Item):#Define the fields for your item here is like: #name = Scrapy. Field ()Code = Scrapy. Field ()#Stock Codeabbr = Scrapy. Field ()#stock abbreviationLast_trade = Scrapy. Field ()#Latest PriceChg_ratio = Scrapy. Field ()#ChangeChg_amt = Scrapy. Field ()#amount of ChgChg_ratio_5min = Scrapy. Field ()#5-minute gainVolumn = Scrapy. Field ()#VolumeTurn_over = Scrapy. Field ()#turnover
settings.py Plus:
fromScrapy.exportersImportJsonitemexporter#The default display of Chinese is poor reading Unicode characters#The subclass is required to display the original character set (the parent class's Ensure--ascii property is set to False)classCustomjsonlinesitemexporter (jsonitemexporter):def __init__(self,file,**Kwargs): Super (customjsonlinesitemexporter,self).__init__(file,ensure_ascii=false,**Kwargs)#to enable the newly defined exporter classFeed_exporters = { 'JSON':'Stockstar.settings.CustomJsonLinesItemExporter',} Download_delay= 0.25
CMD into the project file:
Input: scrapy Genspider stock quote.stockstar.com, production spider code
stock.py
#-*-coding:utf-8-*-Importscrapy fromItemsImportStockstaritem,stockstaritemloaderclassStockspider (scrapy. Spider): Name='Stock' #Define crawler namesAllowed_domains = ['quote.stockstar.com']#Defining a crawler domainStart_urls = ['http://quote.stockstar.com/stock/ranklist_a_3_1_1.html']#Defining crawler Connections defParse (self, Response):#Writing crawler Logicpage = Int (Response.url.split ("_") [ -1].split (".") [0])#Crawl Page NumbersItem_nodes = Response.css ('#datalist TR') forItem_nodeinchItem_nodes:#fetching field content based on the field content defined by the item fileItem_loader = Stockstaritemloader (Item=stockstaritem (), selector=item_node) item_loader.add_css ("Code","td:nth-child (1) a::text") Item_loader.add_css ("abbr","Td:nth-child (2) A::text") Item_loader.add_css ("Last_trade","Td:nth-child (3) Span::text") Item_loader.add_css ("Chg_ratio","Td:nth-child (4) Span::text") Item_loader.add_css ("Chg_amt","Td:nth-child (5) Span::text") Item_loader.add_css ("Chg_ratio_5min","Td:nth-child (6) Span::text") Item_loader.add_css ("volumn","Td:nth-child (7):: Text") Item_loader.add_css ("Turn_over","Td:nth-child (8):: Text") Stock_item=Item_loader.load_item ()yieldStock_itemifItem_nodes:next_page= Page+1Next_url= Response.url.replace ("{0}.html". Format (page),"{0}.html". Format (next_page))yieldScrapy. Request (Url=next_url,callback=self.parse)
Add a main.py under Stockstar
from Import Executeexecute (["scrapy","crawl"," Stock ","-o","items.json"]) # equivalent to input in cmd: scrapy crawl stock-o Items.json
Perform:
Scrapy Crawl Stock Quotes