1. Project Preparation: Website address: http://quanzhou.tianqi.com/
2. Create an edit scrapy crawler:
Scrapy Startproject Weather
Scrapy Genspider Hquspider quanzhou.tianqi.com
Project file Structure
3. Modify items.py:
4. Modify the spider file hquspider.py:
(1) First use command: scrapy shell http://quanzhou.tianqi.com/test and get selector:
(2) Test selector: Open the Chrome browser to view the Web page source code:
(3) Execute command to view response results:
(4) Writing hquspider.py files:
#-*-Coding:utf-8-*-
Import Scrapy
From Weather.items import Weatheritem
Class Hquspiderspider (Scrapy. Spider):
name = ' Hquspider '
Allowed_domains = [' tianqi.com ']
citys=[' Quanzhou ', ' Datong ']
Start_urls = []
For City in Citys:
Start_urls.append (' http://' +city+ '. tianqi.com/')
Def parse (self, Response):
Subselector=response.xpath ('//div[@class = ' Tqshow1 ')
Items=[]
For sub in Subselector:
Item=weatheritem ()
Citydates= "
For citydate in Sub.xpath ('./h3//text () '). Extract ():
Citydates+=citydate
item[' Citydate ']=citydates
item[' Week ']=sub.xpath ('./p//text () '). Extract () [0]
Item[' img ']=sub.xpath ('./ul/li[1]/img/@src '). Extract () [0]
Temps= "
For temp in Sub.xpath ('./ul/li[2]//text () '). Extract ():
Temps+=temp
item[' Temperature ']=temps
item[' weather ']=sub.xpath ('./ul/li[3]//text () '). Extract () [0]
Item[' Wind ']=sub.xpath ('./ul/li[4]//text () '). Extract () [0]
Items.append (item)
return items
(5) Modify pipelines.py I, the result of processing spider:
#-*-Coding:utf-8-*-
# Define your item pipelines here
#
# Don ' t forget to add your pipeline to the Item_pipelines setting
# see:http://doc.scrapy.org/en/latest/topics/item-pipeline.html
Import time
Import Os.path
Import Urllib2
Import Sys
Reload (SYS)
Sys.setdefaultencoding (' UTF8 ')
Class Weatherpipeline (object):
def process_item (self, item, spider):
Today=time.strftime ('%y%m%d ', Time.localtime ())
filename=today+ '. txt '
With open (FileName, ' a ') as FP:
Fp.write (item[' citydate '].encode (' utf-8 ') + ' \ t ')
Fp.write (item[' Week '].encode (' utf-8 ') + ' \ t ')
Imgname=os.path.basename (item[' img ')
Fp.write (imgname+ ' t ')
If Os.path.exists (imgname):
Pass
Else
With open (Imgname, ' WB ') as FP:
Response=urllib2.urlopen (item[' img ')
Fp.write (Response.read ())
Fp.write (item[' temperature '].encode (' utf-8 ') + ' \ t ')
Fp.write (item[' Weather '].encode (' utf-8 ') + ' \ t ')
Fp.write (item[' Wind '].encode (' utf-8 ') + ' \ n ')
Time.sleep (1)
Return item
(6) Modify the settings.py file to decide which file to process the obtained data:
(7) Execution command: Scrapy crawl Hquspider
At this end, a complete scrapy crawler is complete.
2017.08.04 python web crawler's scrapy crawler Combat weather Forecast