Scrapy combined with MySQL crawl weather forecast storage

Source: Internet
Author: User
Tags xpath

To create a scrapy project:

Scrapy Startproject Weather2

Define items (items.py):

Import Scrapyclass Weather2item (scrapy. Item): # define the fields for your item here is like: # name = Scrapy. Field () Weatherdate = Scrapy. Field () WeatherDate2 = Scrapy. Field () Weatherwea = Scrapy. Field () weatherTem1 = Scrapy. Field () weatherTem2 = Scrapy. Field () Weatherwin = Scrapy. Field ()

writing spider (spiders/weatherspider.py):

Import scrapyfrom weather2.items import weather2item class catchweatherspider ( Scrapy. Spider):    name =  ' CatchWeather2 '     allowed_domains =  [' weather.com.cn ']    start_urls = [          "Http://www.weather.com.cn/weather/101280101.shtml"     ]         def parse (Self, response):         for sel in response.xpath ('//*[@id = ' 7d ']/ul/li '):             item = weather2item ()              item[' weatherdate '] = sel.xpath (' H1/text () '). Extract ()               item[' WeatherDate2 '] = sel.xpath (' H2/text () '). Extract ()             item[' WeatherWea '] =  Sel.xpath (' p[@class = ' WEA ']/text () '). Extract ()              item[' weatherTem1 '] = sel.xpath (' p[@class = ' tem tem1 ']/span/text () '). Extract ()   + sel.xpath (' p[@class = ' tem tem1 ']/i/text () '). Extract ()              item[' weatherTem2 '] = sel.xpath (' p[@class = ' tem tem2 ']/span/text () '). Extract ()  + sel.xpath (' p[@class = ' tem tem2 ']/i/text () '). Extract ()              item[' Weatherwin '] = sel.xpath (' p[@class = "Win"]/i/ Text () '). Extract ()             yield item
    • Name: Define the name of the spider.

    • Allowed_domains: Contains the underlying URL that makes up the license domain for spiders to crawl.

    • Start_urls: is a list of URLs where spiders start crawling. Spiders download data from URLs in Start_urls, and all subsequent URLs are retrieved from the data.

Data source is http://www.weather.com.cn/weather/101280101.shtml,101280101 is the city number of Guangzhou

XPath parsing HTML is used here, it feels so simple.


Test run:

Scrapy Crawl CatchWeather2

Result fragment:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/75/CE/wKioL1ZC8wyDfhAQAACOOLnoGMI038.png "title=" 1.png " alt= "Wkiol1zc8wydfhaqaacoolnogmi038.png"/>

We've got the data we want.

To create a database:

CREATE TABLE ' yunweiapp_weather ' (' id ' int (one) not null auto_increment, ' weatherdate ' varchar () DEFAULT NULL, ' Weath ErDate2 ' varchar ' NOT NULL, ' weatherwea ' varchar (TEN) NOT null, ' weatherTem1 ' varchar (TEN) NOT null, ' WEATHERTEM2 ' var  Char (TEN) is not null, ' weatherwin ' varchar (TEN) is not null, ' UpdateTime ' datetime is NOT NULL, PRIMARY KEY (' id ')) engine=innodb Auto_increment=15 DEFAULT Charset=utf8;

Create pipelines ():

Import mysqldbimport datetimedebug = trueif debug:    dbuser  =  ' Lihuipeng '     dbpass =  ' Lihuipeng '     dbname  =  ' Game_main '     dbhost =  ' 192.168.1.100 '      dbport =  ' 3306 ' else:    dbuser =  ' root '     dbpass  =  ' Lihuipeng '     dbname =  ' Game_main '     dbhost  =  ' 127.0.0.1 '     dbport =  ' 3306 '     class  Mysqlstorepipeline (object):     def __init__ (self):         self.conn = mysqldb.connect (user=dbuser, passwd=dbpass, db=dbname,  host=dbhost, charset= "UTF8",  use_unicode=true)          Self.cursor = self.conn.curSor ()          #清空表:         Self.cursor.execute ("truncate table yunweiapp_weather;")         self.conn.commit ()               def process_item (Self, item, spider):          curtime =  datetime.datetime.now ()            try:             self.cursor.execute ("" "insert into yunweiapp_weather  (Weatherdate, weatherdate2,  weatherwea, weathertem1, weathertem2, weatherwin, updatetime)                                 VALUES  (%s, %s, %s, %s, %s, %s, %s) "" ",                                (                                 item[' WeatherDate ' [ 0].encode (' Utf-8 '),                                   item[' WeatherDate2 '][0].encode (' utf-8 '),                                  item[' Weatherwea '][0].encode (' utf-8 '),                                  item[' weatherTem1 '][0].encode (' utf-8 '),                                  item[' weatherTem2 '][0].encode (' utf-8 '),                                  item[' Weatherwin '][0].encode (' utf-8 '),                                  curTime,                              )             )                  self.conn.commit ()                  except MySQLdb.Error, e:             print  "error %d: %s"  %  (e.args[0),  E.ARGS[1])         return item

Modify setting.py Enable pipelines:

Item_pipelines = {# ' Weather2.pipelines.Weather2Pipeline ': $, ' weather2.pipelines.MySQLStorePipeline ': 400,}

The number behind is just a weight, within 0-1000

Re-test run:

Scrapy Crawl CatchWeather2

Results:

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/75/D0/wKiom1ZC8-XgKrBXAABk-4QwIXM949.png "title=" 2.png " alt= "Wkiom1zc8-xgkrbxaabk-4qwixm949.png"/> combined operation and maintenance background casually display:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/75/D0/wKiom1ZC9DqRrKIfAACjPeFkvqw919.png "title=" 3.png " alt= "Wkiom1zc9dqrrkifaacjpefkvqw919.png"/>

Finish the Meal ~ ~

This article is from the "Operations Notes" blog, make sure to keep this source http://lihuipeng.blog.51cto.com/3064864/1711852

Scrapy combined with MySQL crawl weather forecast storage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.