Then the previous article, http://blog.csdn.net/androidworkor/article/details/51176387, says how to write the captured data to the database.
1. Writing a crawler script
Or in the case of crawling embarrassing encyclopedia, write a script that is saved in the spider_qiushibaike.py file in the Hello/spiders directory
#-*-Coding:utf-8-*-import scrapyfrom hello.items import helloitemclass spider_qiushibaike (scrapy. Spider): name = "Qiubai" start_urls = [' http://www.qiushibaike.com '] def parse (self, Response): For item in Response.xpath ('//div[@id = "Content-left"]/div[@class = "Article Block untagged mb15"]: Qiubai = H Elloitem () icon = Item.xpath ('./div[@class = ' author Clearfix ']/a[1]/img/@src '). Extract () If icon: icon = icon[0] qiubai[' usericon '] = icon UserName = Item.xpath ('./div[@class = "Author Clearfix "]/a[2]/h2/text ()"). Extract () If username:username = Username[0] Qiubai [' userName '] = userName content = Item.xpath ('./div[@class = "Content"]/descendant::text () '). Extract () If Content:con = "for str in Content:con + = str Qiuba i[' content ' = con like = Item.xpaTh ('./div[@class = "Stats"]/span[@class = "Stats-vote"]/i/text () '). Extract () If like:like = Like[0] qiubai[' like ' = comment = Item.xpath ('./div[@class = "Stats"]/span[@class = "Stats-comments"] /a/i/text () '). Extract () If comment:comment = comment[0] qiubai[' comment '] = Comm ENT yield Qiubai
2. Create DATABASE 2.1 CREATE database
I'm using the SQLyog software. Open SQLyog, fill in the Security/storage connection, MySQL Host Address, user name, password as follows:
When you are finished, click Connect to enter the database interface as shown in:
Right click on the red box to circle the area, select Create database as shown in:
So the database is created.
2.2 Creating a Table
Select the database you just created, right-click and select Create TABLE as shown below:
An interface appears as shown in:
Fill in the Table name, field, type, and so on. So the table is built. The structure of the table must be consistent with the number and type of items.py defined previously
# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# http://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass HelloItem(scrapy.Item): # define the fields for your item here like: userIcon = scrapy.Field() userName = scrapy.Field() content = scrapy.Field() like = scrapy.Field() comment = scrapy.Field()
3. Writing the pipelines.py file
To write the pipelines file, its file directory authority is this:
Where the code is as follows:
# -*- coding: utf-8 -*-# Define your item pipelines here## Don‘t forget to add your pipeline to the ITEM_PIPELINES setting# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.htmlimport pymysqldef dbHandle(): conn = pymysql.connect( host=‘localhost‘, user=‘root‘, passwd=‘root‘, charset=‘utf8‘, use_unicode=False ) return connclass HelloPipeline(object): def process_item(self, item, spider): dbObject = dbHandle() cursor = dbObject.cursor() sql = ‘insert into joke.t_baike(userIcon,userName,content,likes,comment) values (%s,%s,%s,%s,%s)‘ try: cursor.execute(sql,(item[‘userIcon‘],item[‘userName‘],item[‘content‘],item[‘like‘],item[‘comment‘])) dbObject.commit() except Exception,e: print e dbObject.rollback() return item
4. Configure the pipelines.py to take effect
Open the settings.py file and add a line of code
ITEM_PIPELINES = { ‘Hello.pipelines.HelloPipeline‘: 300,}
That's all you can do. Complete as shown in:
5. Running
All the preparations have been done, now is the time to witness the miracle.
Open command line execution
scrapy crawl qiubai
At this point you can probably see a similar log, as shown in:
Okay, here it is, just to see if the database was inserted successfully.
The database is plugged in successfully, looking at whether the data is crawling correctly, opening qiushibaike.com, and
All right, it's done.
Allow me to make an advertisement:
If you don't know how the environment is built, check out this article: http://blog.csdn.net/androidworkor/article/details/51171098
If you are not familiar with the basic usage of scrapy you can read this article: http://blog.csdn.net/androidworkor/article/details/51176387
OK, what's the problem, please leave a message.
Scrapy Getting Started tutorial writing to the database