- spider.py file Configuration
1 2#-*-coding:utf-8-*-3Importscrapy4 fromItteachers.itemsImportItteachersitem5 6 7classItcastspider (scrapy. Spider):8 name ='Itcast'9 Allowed_domains = ['itcast.cn'] Ten start_urls = ['http://www.itcast.cn/channel/teacher.shtml#'] 11 12defParse (self, response):13#with open ("Teacher.html", "W") as F:14#f.write (response.body)All items = [] Teacher_list = Response.xpath ('//div[@class = "Li_txt"]') 19 foreachinchteacher_list:20 21#we encapsulate the resulting data into a ' Itcastitem ' objectitem =Itteachersitem ()The name = Each.xpath ('H3/text ()'). Extract ()title = Each.xpath ('H4/text ()'). Extract ()+ info = Each.xpath ('P/text ()'). Extract ()26 27#XPath Returns a list that contains an elementitem['name'] =Name[0]item['title'] =Title[0]item['Info'] =Info[0]31 32items.append (item)33#return the final data directly34returnItems~
- items.py file Configuration
1#-*-coding:utf-8-*-2 3#Define Here the models for your scraped items4#5#See documentation in:6#https://doc.scrapy.org/en/latest/topics/items.html7 8Importscrapy9 10 11classItteachersitem (scrapy. Item):12#Define the fields for your item here is like:13#name = Scrapy. Field ()+ name =Scrapy. Field ()title =Scrapy. Field ()info = scrapy. Field ()
Scrapy Crawl Itcast-o Teachers.json Reptile case