To create a scrapy project:
Scrapy Startproject Weather2
Define items (items.py):
Import Scrapyclass Weather2item (scrapy. Item): # define the fields for your item here is like: # name = Scrapy. Field () Weatherdate = Scrapy. Field () WeatherDate2 = Scrapy. Field () Weatherwea = Scrapy. Field () weatherTem1 = Scrapy. Field () weatherTem2 = Scrapy. Field () Weatherwin = Scrapy. Field ()
writing spider (spiders/weatherspider.py):
Import scrapyfrom weather2.items import weather2item class catchweatherspider ( Scrapy. Spider): name = ' CatchWeather2 ' allowed_domains = [' weather.com.cn '] start_urls = [ "Http://www.weather.com.cn/weather/101280101.shtml" ] def parse (Self, response): for sel in response.xpath ('//*[@id = ' 7d ']/ul/li '): item = weather2item () item[' weatherdate '] = sel.xpath (' H1/text () '). Extract () item[' WeatherDate2 '] = sel.xpath (' H2/text () '). Extract () item[' WeatherWea '] = Sel.xpath (' p[@class = ' WEA ']/text () '). Extract () item[' weatherTem1 '] = sel.xpath (' p[@class = ' tem tem1 ']/span/text () '). Extract () + sel.xpath (' p[@class = ' tem tem1 ']/i/text () '). Extract () item[' weatherTem2 '] = sel.xpath (' p[@class = ' tem tem2 ']/span/text () '). Extract () + sel.xpath (' p[@class = ' tem tem2 ']/i/text () '). Extract () item[' Weatherwin '] = sel.xpath (' p[@class = "Win"]/i/ Text () '). Extract () yield item
Name: Define the name of the spider.
Allowed_domains: Contains the underlying URL that makes up the license domain for spiders to crawl.
Start_urls: is a list of URLs where spiders start crawling. Spiders download data from URLs in Start_urls, and all subsequent URLs are retrieved from the data.
Data source is http://www.weather.com.cn/weather/101280101.shtml,101280101 is the city number of Guangzhou
XPath parsing HTML is used here, it feels so simple.
Test run:
Scrapy Crawl CatchWeather2
Result fragment:
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/75/CE/wKioL1ZC8wyDfhAQAACOOLnoGMI038.png "title=" 1.png " alt= "Wkiol1zc8wydfhaqaacoolnogmi038.png"/>
We've got the data we want.
To create a database:
CREATE TABLE ' yunweiapp_weather ' (' id ' int (one) not null auto_increment, ' weatherdate ' varchar () DEFAULT NULL, ' Weath ErDate2 ' varchar ' NOT NULL, ' weatherwea ' varchar (TEN) NOT null, ' weatherTem1 ' varchar (TEN) NOT null, ' WEATHERTEM2 ' var Char (TEN) is not null, ' weatherwin ' varchar (TEN) is not null, ' UpdateTime ' datetime is NOT NULL, PRIMARY KEY (' id ')) engine=innodb Auto_increment=15 DEFAULT Charset=utf8;
Create pipelines ():
Import mysqldbimport datetimedebug = trueif debug: dbuser = ' Lihuipeng ' dbpass = ' Lihuipeng ' dbname = ' Game_main ' dbhost = ' 192.168.1.100 ' dbport = ' 3306 ' else: dbuser = ' root ' dbpass = ' Lihuipeng ' dbname = ' Game_main ' dbhost = ' 127.0.0.1 ' dbport = ' 3306 ' class Mysqlstorepipeline (object): def __init__ (self): self.conn = mysqldb.connect (user=dbuser, passwd=dbpass, db=dbname, host=dbhost, charset= "UTF8", use_unicode=true) Self.cursor = self.conn.curSor () #清空表: Self.cursor.execute ("truncate table yunweiapp_weather;") self.conn.commit () def process_item (Self, item, spider): curtime = datetime.datetime.now () try: self.cursor.execute ("" "insert into yunweiapp_weather (Weatherdate, weatherdate2, weatherwea, weathertem1, weathertem2, weatherwin, updatetime) VALUES (%s, %s, %s, %s, %s, %s, %s) "" ", ( item[' WeatherDate ' [ 0].encode (' Utf-8 '), item[' WeatherDate2 '][0].encode (' utf-8 '), item[' Weatherwea '][0].encode (' utf-8 '), item[' weatherTem1 '][0].encode (' utf-8 '), item[' weatherTem2 '][0].encode (' utf-8 '), item[' Weatherwin '][0].encode (' utf-8 '), curTime, ) ) self.conn.commit () except MySQLdb.Error, e: print "error %d: %s" % (e.args[0),  E.ARGS[1]) return item
Modify setting.py Enable pipelines:
Item_pipelines = {# ' Weather2.pipelines.Weather2Pipeline ': $, ' weather2.pipelines.MySQLStorePipeline ': 400,}
The number behind is just a weight, within 0-1000
Re-test run:
Scrapy Crawl CatchWeather2
Results:
650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/75/D0/wKiom1ZC8-XgKrBXAABk-4QwIXM949.png "title=" 2.png " alt= "Wkiom1zc8-xgkrbxaabk-4qwixm949.png"/> combined operation and maintenance background casually display:
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/75/D0/wKiom1ZC9DqRrKIfAACjPeFkvqw919.png "title=" 3.png " alt= "Wkiom1zc9dqrrkifaacjpefkvqw919.png"/>
Finish the Meal ~ ~
This article is from the "Operations Notes" blog, make sure to keep this source http://lihuipeng.blog.51cto.com/3064864/1711852
Scrapy combined with MySQL crawl weather forecast storage