Python---scrapy mysql sync store

Source: Internet
Author: User

Assuming that we have access to the data for the field defined in item, then we need to save the item's data to the MySQL database.

Pipeline is used to store the data in item and to process the crawled data two times

First of all, to do the preparation of the work, install MYSQLDB, I install the python-mysql1.2.5 module.

Customizing a pipeline to store the data in item with MySQL

classMysqlpipeline (object):#customizing a pipeline to store the data in item with MySQL    def __init__(self):#Code Connection Database        #1) Connection        #The connected database must existdb = MySQLdb.connect (host='localhost', user='Root', passwd='123456', db='TestDB', charset='UTF8', use_unicode=True)#Cursors/Pointerscursor =db.cursor () self.db=DB self.cursor=cursor#Delete Table FirstSQL="drop table IF EXISTS test"self.cursor.execute (SQL) self.db.commit () SQL="CREATE table if not EXISTS test (ID INT PRIMARY KEY auto_increment not NULL, title VARCHAR (a) not null,category_name varchar (+), date_time varchar () not NULL, likes int default 0,content longtext, comment int default 0,collect int DE FAULT 0,detail_url varchar (255) UNIQUE,SRC varchar (255))"        #parameter 1:query, fill in the SQL statement        #parameter 2:args, parameter, default is empty, fill in tupleself.cursor.execute (SQL) Self.db.commit ()defProcess_item (self, item, spider):#2) Perform related actions        ##3) Close the cursor, turn off the DB before closing the connection        #cursor.close ()        #db.close ()        #If you want to add data to all columns, the column name may not be written        Try: SQL="INSERT INTO Test (Title,category_name, Date_time,likes,content, Comment,collect, DETAIL_URL,SRC) VALUES (%s,%s,%s, %s,%s,%s,%s,%s,%s)"self.cursor.execute (SQL, (item['title'],item['Category_name'],item['Date_time'],item['likes'], item['content'],item['Comment'], item['Collect'],item['Detail_url'],item['src'][0]) self.db.commit ()except:            PrintU'data duplication is negligible'        returnItemdef __del__(self): Self.cursor.close () self.db.close ()

Process_item (Self,item,spider) This method is called by each item pipeline component, and the method must return a dictionary data, item, or throw a Dropitem exception.

Under the settings registration

Item_pipelines = {    #MySQL synchronous write    "  JobboleSpider.pipelines.MySQLPipeline": 2,}

There are also ways to manipulate databases directly from model objects called ORM

Features: No need to write SQL statements, you can directly manipulate the database

Added: Item.Save (),

Delete: Item.delete ()

............................

Python---scrapy mysql sync store

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.