First, let's review the basic operations of MongoDB:
Database, collection, document Db,show dbs,use database name, drop database db. collection name. Insert ({}) db. collection name. Update ({condition},{$set: {}},{multi:true}) db. Collection name. Remove ({ Conditions}) DB collection name. Find ({conditional},{projection}). Limit (). Skip (). Sort (). Count (). DISTINCT () database add modify delete query Mysqlinsert update delete Selectredissetsetdelgetmongodbinsertupdateremovefind,aggregate
String
Hash
List
Set
Zset
Increase
Mysql:insert into table name (column) values (value)
Mongo:db. Collection name. Insert ({})
Modify:
Mysql:update table Name Set column = value where condition
Mongo:db. Collection name. Update ({conditional},{value $set},{modified multiple})
Delete:
Mysql:delete from table name where ....
Mongo:db. Collection name. Remove ({condition},{whether to delete more than one})
Inquire:
Db.stu.find ({},{})
Comparison operators, logical operators, $where
Limit (), skip (), sort (), count (), distinct ()
First use XPath to extract the information you want to crawl: The information we need to crawl in this project is: title, information, rating, Introduction
First page link: https://movie.douban.com/top250
Second page link: https://movie.douban.com/top250?start=25&filter=
Third page link: https://movie.douban.com/top250?start=50&filter=
Rule: https://movie.douban.com/top250?start=\d+&filter=
Title://a/span[@class = "title"][1]
Info://div[@class = "BD"]/p[1]/text ()
Rating://div[@class = "star"]/span[2]/text ()
Introduction://span[@class = "Inq"]/text ()
Then use Sscrapy startproject Douban to create the project
Sscrapy Genspider Dopuban movie.douban.com
Then write the following file in turn:
items.pydoubanmovie.pysettings.pypipelines.py
Watercress movie top250 Crawl and save in MongoDB