Mapreduce with MongoDB and Python

Source: Internet
Author: User
Tags mongodb book

1. install and use MongoDB

A) download MongoDB. Note that 32bit can only store 2 GB of content (32-bit builds are limited to around 2 GB of data ).

B) Configure MongoDB. config and run the command line mongod.exe -- config/path/to/Your/MongoDB. config.

C) download pymongo and use python to write the test program later.

See the little MongoDB book, (PDF ).

2 mapreduce

MAP/reduce in MongoDB is useful for batch processing of data and aggregation operations. it is similar in spirit to using something like hadoop with all input coming from a collection and output going to a collection. often, in a situation where you wowould have used group by in SQL, MAP/reduce is the right tool in MongoDB.

See the Introduction to mapreduce on the MongoDB website. The MAP/reduce process is as follows:

 

3. Example

Take word statistics as an example. The input text is the speech of Obama. You can see how often the words are used. For example:

 

MongoDB uses JS scripts to run the client.

The map program is:

Reduce program:

 

The client program is:

from pymongo import Connection
from pymongo.code import Code


#'''
#Open a connection to MongoDb (localhost)
connection = Connection()
db = connection.test

#Remove any existing data
db.texts.remove()

#Insert the data
lines = open('2009-obama.txt').readlines()
[db.texts.insert({'text': line}) for line in lines]

#Load map and reduce functions
map = Code(open('wordMap.js','r').read())
reduce = Code(open('wordReduce.js','r').read())


#Run the map-reduce query
results = db.texts.map_reduce(map, reduce, "collection_name")

#Print the results
for result in results.find():
print result['_id'] , result['value']['count']

The running result is:

 

The article code can be downloaded here.

 

See mapreduce with MongoDB and Python and here.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.