Assuming that we have access to the data for the field defined in item, then we need to save the item's data to the MySQL database.
Pipeline is used to store the data in item and to process the crawled data two times
First of all, to do the preparation of the work, install MYSQLDB, I install the python-mysql1.2.5 module.
Customizing a pipeline to store the data in item with MySQL
classMysqlpipeline (object):#customizing a pipeline to store the data in item with MySQL def __init__(self):#Code Connection Database #1) Connection #The connected database must existdb = MySQLdb.connect (host='localhost', user='Root', passwd='123456', db='TestDB', charset='UTF8', use_unicode=True)#Cursors/Pointerscursor =db.cursor () self.db=DB self.cursor=cursor#Delete Table FirstSQL="drop table IF EXISTS test"self.cursor.execute (SQL) self.db.commit () SQL="CREATE table if not EXISTS test (ID INT PRIMARY KEY auto_increment not NULL, title VARCHAR (a) not null,category_name varchar (+), date_time varchar () not NULL, likes int default 0,content longtext, comment int default 0,collect int DE FAULT 0,detail_url varchar (255) UNIQUE,SRC varchar (255))" #parameter 1:query, fill in the SQL statement #parameter 2:args, parameter, default is empty, fill in tupleself.cursor.execute (SQL) Self.db.commit ()defProcess_item (self, item, spider):#2) Perform related actions ##3) Close the cursor, turn off the DB before closing the connection #cursor.close () #db.close () #If you want to add data to all columns, the column name may not be written Try: SQL="INSERT INTO Test (Title,category_name, Date_time,likes,content, Comment,collect, DETAIL_URL,SRC) VALUES (%s,%s,%s, %s,%s,%s,%s,%s,%s)"self.cursor.execute (SQL, (item['title'],item['Category_name'],item['Date_time'],item['likes'], item['content'],item['Comment'], item['Collect'],item['Detail_url'],item['src'][0]) self.db.commit ()except: PrintU'data duplication is negligible' returnItemdef __del__(self): Self.cursor.close () self.db.close ()
Process_item (Self,item,spider) This method is called by each item pipeline component, and the method must return a dictionary data, item, or throw a Dropitem exception.
Under the settings registration
Item_pipelines = { #MySQL synchronous write " JobboleSpider.pipelines.MySQLPipeline": 2,}
There are also ways to manipulate databases directly from model objects called ORM
Features: No need to write SQL statements, you can directly manipulate the database
Added: Item.Save (),
Delete: Item.delete ()
............................
Python---scrapy mysql sync store