Scrapy combined with MySQL crawl weather forecast storage

Last Update:2015-11-11 Source: Internet

Author: User

Tags xpath

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To create a scrapy project:

Scrapy Startproject Weather2

Define items (items.py):

Import Scrapyclass Weather2item (scrapy. Item): # define the fields for your item here is like: # name = Scrapy. Field () Weatherdate = Scrapy. Field () WeatherDate2 = Scrapy. Field () Weatherwea = Scrapy. Field () weatherTem1 = Scrapy. Field () weatherTem2 = Scrapy. Field () Weatherwin = Scrapy. Field ()

writing spider (spiders/weatherspider.py):

Import scrapyfrom weather2.items import weather2item class catchweatherspider ( Scrapy. Spider):    name =  ' CatchWeather2 '     allowed_domains =  [' weather.com.cn ']    start_urls = [          "Http://www.weather.com.cn/weather/101280101.shtml"     ]         def parse (Self, response):         for sel in response.xpath ('//*[@id = ' 7d ']/ul/li '):             item = weather2item ()              item[' weatherdate '] = sel.xpath (' H1/text () '). Extract ()               item[' WeatherDate2 '] = sel.xpath (' H2/text () '). Extract ()             item[' WeatherWea '] =  Sel.xpath (' p[@class = ' WEA ']/text () '). Extract ()              item[' weatherTem1 '] = sel.xpath (' p[@class = ' tem tem1 ']/span/text () '). Extract ()   + sel.xpath (' p[@class = ' tem tem1 ']/i/text () '). Extract ()              item[' weatherTem2 '] = sel.xpath (' p[@class = ' tem tem2 ']/span/text () '). Extract ()  + sel.xpath (' p[@class = ' tem tem2 ']/i/text () '). Extract ()              item[' Weatherwin '] = sel.xpath (' p[@class = "Win"]/i/ Text () '). Extract ()             yield item

Name: Define the name of the spider.
Allowed_domains: Contains the underlying URL that makes up the license domain for spiders to crawl.
Start_urls: is a list of URLs where spiders start crawling. Spiders download data from URLs in Start_urls, and all subsequent URLs are retrieved from the data.

Data source is http://www.weather.com.cn/weather/101280101.shtml,101280101 is the city number of Guangzhou

XPath parsing HTML is used here, it feels so simple.

Test run:

Scrapy Crawl CatchWeather2

Result fragment:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/75/CE/wKioL1ZC8wyDfhAQAACOOLnoGMI038.png "title=" 1.png " alt= "Wkiol1zc8wydfhaqaacoolnogmi038.png"/>

We've got the data we want.

To create a database:

CREATE TABLE ' yunweiapp_weather ' (' id ' int (one) not null auto_increment, ' weatherdate ' varchar () DEFAULT NULL, ' Weath ErDate2 ' varchar ' NOT NULL, ' weatherwea ' varchar (TEN) NOT null, ' weatherTem1 ' varchar (TEN) NOT null, ' WEATHERTEM2 ' var  Char (TEN) is not null, ' weatherwin ' varchar (TEN) is not null, ' UpdateTime ' datetime is NOT NULL, PRIMARY KEY (' id ')) engine=innodb Auto_increment=15 DEFAULT Charset=utf8;

Create pipelines ():

Import mysqldbimport datetimedebug = trueif debug:    dbuser  =  ' Lihuipeng '     dbpass =  ' Lihuipeng '     dbname  =  ' Game_main '     dbhost =  ' 192.168.1.100 '      dbport =  ' 3306 ' else:    dbuser =  ' root '     dbpass  =  ' Lihuipeng '     dbname =  ' Game_main '     dbhost  =  ' 127.0.0.1 '     dbport =  ' 3306 '     class  Mysqlstorepipeline (object):     def __init__ (self):         self.conn = mysqldb.connect (user=dbuser, passwd=dbpass, db=dbname,  host=dbhost, charset= "UTF8",  use_unicode=true)          Self.cursor = self.conn.curSor ()          #清空表:         Self.cursor.execute ("truncate table yunweiapp_weather;")         self.conn.commit ()               def process_item (Self, item, spider):          curtime =  datetime.datetime.now ()            try:             self.cursor.execute ("" "insert into yunweiapp_weather  (Weatherdate, weatherdate2,  weatherwea, weathertem1, weathertem2, weatherwin, updatetime)                                 VALUES  (%s, %s, %s, %s, %s, %s, %s) "" ",                                (                                 item[' WeatherDate ' [ 0].encode (' Utf-8 '),                                   item[' WeatherDate2 '][0].encode (' utf-8 '),                                  item[' Weatherwea '][0].encode (' utf-8 '),                                  item[' weatherTem1 '][0].encode (' utf-8 '),                                  item[' weatherTem2 '][0].encode (' utf-8 '),                                  item[' Weatherwin '][0].encode (' utf-8 '),                                  curTime,                              )             )                  self.conn.commit ()                  except MySQLdb.Error, e:             print  "error %d: %s"  %  (e.args[0), &NBSP;E.ARGS[1])         return item

Modify setting.py Enable pipelines:

Item_pipelines = {# ' Weather2.pipelines.Weather2Pipeline ': $, ' weather2.pipelines.MySQLStorePipeline ': 400,}

The number behind is just a weight, within 0-1000

Re-test run:

Scrapy Crawl CatchWeather2

Results:

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/75/D0/wKiom1ZC8-XgKrBXAABk-4QwIXM949.png "title=" 2.png " alt= "Wkiom1zc8-xgkrbxaabk-4qwixm949.png"/> combined operation and maintenance background casually display:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/75/D0/wKiom1ZC9DqRrKIfAACjPeFkvqw919.png "title=" 3.png " alt= "Wkiom1zc9dqrrkifaacjpefkvqw919.png"/>

Finish the Meal ~ ~

This article is from the "Operations Notes" blog, make sure to keep this source http://lihuipeng.blog.51cto.com/3064864/1711852

Scrapy combined with MySQL crawl weather forecast storage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More