1. install and use MongoDB
A) download MongoDB. Note that 32bit can only store 2 GB of content (32-bit builds are limited to around 2 GB of data ).
B) Configure MongoDB. config and run the command line mongod.exe -- config/path/to/Your/MongoDB. config.
C) download pymongo and use python to write the test program later.
See the little MongoDB book, (PDF ).
2 mapreduce
MAP/reduce in MongoDB is useful for batch processing of data and aggregation operations. it is similar in spirit to using something like hadoop with all input coming from a collection and output going to a collection. often, in a situation where you wowould have used group by in SQL, MAP/reduce is the right tool in MongoDB.
See the Introduction to mapreduce on the MongoDB website. The MAP/reduce process is as follows:
3. Example
Take word statistics as an example. The input text is the speech of Obama. You can see how often the words are used. For example:
MongoDB uses JS scripts to run the client.
The map program is:
Reduce program:
The client program is:
from pymongo import Connection
from pymongo.code import Code
#'''
#Open a connection to MongoDb (localhost)
connection = Connection()
db = connection.test
#Remove any existing data
db.texts.remove()
#Insert the data
lines = open('2009-obama.txt').readlines()
[db.texts.insert({'text': line}) for line in lines]
#Load map and reduce functions
map = Code(open('wordMap.js','r').read())
reduce = Code(open('wordReduce.js','r').read())
#Run the map-reduce query
results = db.texts.map_reduce(map, reduce, "collection_name")
#Print the results
for result in results.find():
print result['_id'] , result['value']['count']
The running result is:
The article code can be downloaded here.
See mapreduce with MongoDB and Python and here.