Scrapy Grab Stock Quotes

Source: Internet
Author: User

Install Scrapy error, we choose Anaconda3 as the compilation environment, search Scrapy installation (Error self-examination)

To create a scrapy crawler project:

Bring up cmd to the appropriate directory: enter:

Scrapy Startproject Stockstar

Directory file with spide code spider (used to write crawlers)

Item file items.py in the project (the container used to hold the crawled data, which is stored in a similar way to the Python dictionary)

Middleware middlewares.py for projects (provides a simple mechanism to extend the functionality of Scrapy by allowing custom code to be inserted)

Project pipelines file pipelines.py (core processor)

Setup file for Project settings.py

Configuration file for Project Scrapy.cfg

After creating the project: In the settings file, there is a sentence:

# Obey robots.txt Rules Robotstxt_obey = True

Sometimes we need to turn off: set to False

Right-click the folder and select it in the popup shortcut: Mark Directory as--sources Root, which makes the import package syntax more concise

1. Define an item container:

Written in items.py:

#-*-coding:utf-8-*-#Define Here the models for your scraped items##See documentation in:#https://doc.scrapy.org/en/latest/topics/items.htmlImportscrapy fromScrapy.loaderImportItemloader fromScrapy.loader.processorsImportTakefirstclassStockstaritemloader (itemloader):#custom itemloader for storing field content crawled by crawlersDefault_output_processor =Takefirst ()classStockstaritem (scrapy. Item):#Define the fields for your item here is like:    #name = Scrapy. Field ()Code = Scrapy. Field ()#Stock Codeabbr = Scrapy. Field ()#stock abbreviationLast_trade = Scrapy. Field ()#Latest PriceChg_ratio = Scrapy. Field ()#ChangeChg_amt = Scrapy. Field ()#amount of ChgChg_ratio_5min = Scrapy. Field ()#5-minute gainVolumn = Scrapy. Field ()#VolumeTurn_over = Scrapy. Field ()#turnover

settings.py Plus:

 fromScrapy.exportersImportJsonitemexporter#The default display of Chinese is poor reading Unicode characters#The subclass is required to display the original character set (the parent class's Ensure--ascii property is set to False)classCustomjsonlinesitemexporter (jsonitemexporter):def __init__(self,file,**Kwargs): Super (customjsonlinesitemexporter,self).__init__(file,ensure_ascii=false,**Kwargs)#to enable the newly defined exporter classFeed_exporters = {    'JSON':'Stockstar.settings.CustomJsonLinesItemExporter',} Download_delay= 0.25

CMD into the project file:

Input: scrapy Genspider stock quote.stockstar.com, production spider code

stock.py

#-*-coding:utf-8-*-Importscrapy fromItemsImportStockstaritem,stockstaritemloaderclassStockspider (scrapy. Spider): Name='Stock'  #Define crawler namesAllowed_domains = ['quote.stockstar.com']#Defining a crawler domainStart_urls = ['http://quote.stockstar.com/stock/ranklist_a_3_1_1.html']#Defining crawler Connections    defParse (self, Response):#Writing crawler Logicpage = Int (Response.url.split ("_") [ -1].split (".") [0])#Crawl Page NumbersItem_nodes = Response.css ('#datalist TR')         forItem_nodeinchItem_nodes:#fetching field content based on the field content defined by the item fileItem_loader = Stockstaritemloader (Item=stockstaritem (), selector=item_node) item_loader.add_css ("Code","td:nth-child (1) a::text") Item_loader.add_css ("abbr","Td:nth-child (2) A::text") Item_loader.add_css ("Last_trade","Td:nth-child (3) Span::text") Item_loader.add_css ("Chg_ratio","Td:nth-child (4) Span::text") Item_loader.add_css ("Chg_amt","Td:nth-child (5) Span::text") Item_loader.add_css ("Chg_ratio_5min","Td:nth-child (6) Span::text") Item_loader.add_css ("volumn","Td:nth-child (7):: Text") Item_loader.add_css ("Turn_over","Td:nth-child (8):: Text") Stock_item=Item_loader.load_item ()yieldStock_itemifItem_nodes:next_page= Page+1Next_url= Response.url.replace ("{0}.html". Format (page),"{0}.html". Format (next_page))yieldScrapy. Request (Url=next_url,callback=self.parse)

Add a main.py under Stockstar

 from Import Executeexecute (["scrapy","crawl"," Stock ","-o","items.json"]) # equivalent to input in cmd: scrapy crawl stock-o Items.json

Perform:

Scrapy Crawl Stock Quotes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.