After MongoDB 3.2, the default is to use the Wiretiger engine
To change the storage engine at startup:
Mongod--storageengine Mmapv1--dbpath d:\data\db
This will solve the problem that Mongvue cannot view the document!
Project Process (steps):
Go to prepare (install Scrapy Pymongo mongodb)
1. Build project directory: Scrapy Startproject stack
2.itmes
From scrapy import Item,field
Class Stackitem (Item):
title = Field ()
url = Field ()
3. Creating crawlers
From scrapy import Spider
From Scrapy.selector import Selector
From Stack.items import Stackitem
Class Stackspider (Spider):
name = "Stack"
Allowed_domains = ["stackoverflow.com"]
Start_urls = [
"Http://stackoverflow.com/questions?pagesize=50&sort=newest",
]
Def parse (self, Response):
Questions = Response.xpath ('//div[@class = ' summary ']/h3 ')
For question in questions:
item = Stackitem ()
item[' title '] = Question.xpath (
' a[@class = ' Question-hyperlink ']/text () '). Extract () [0]
item[' url '] = Question.xpath (
' a[@class = ' question-hyperlink ']/@href '). Extract () [0]
Yield item
4. Learn to extract data using XPath selectors
5. Storing data in MONGO
5.1 setting.py
Item_pipelines = {
' Stack.pipelines.MongoDBPipeline ': 300,
}
Mongodb_server = "localhost"
Mongodb_port = 27017
mongodb_db = "StackOverflow"
Mongodb_collection = "questions"
5.2 pipelines.py
Import Pymongo
From scrapy.conf Import settings
From scrapy.exceptions import Dropitem
From scrapy import log
Class Mongodbpipeline (object):
def __init__ (self):
Connection = Pymongo. Mongoclient (
settings[' Mongodb_server '),
settings[' Mongodb_port ']
)
db = Connection[settings[' mongodb_db ']
self.collection = db[settings[' mongodb_collection ']
def process_item (self, item, spider):
valid = True
For data in item:
If not data:
valid = False
Raise Dropitem ("Missing {0}!"). Format (data))
If valid:
Self.collection.insert (Dict (item))
Log.msg ("Question added to MongoDB database!",
Level=log. DEBUG, Spider=spider)
Return item
6. Start the crawler main.py
From scrapy import CmdLine
Cmdline.execute (' Scrapy crawl stack '. Split ())
Python second week (11th day) My Python growth is one months to get the Python data mining done! (+)-scrapy + MONGO