1 安裝使用MongoDB
a) 下載MongoDB, 請注意,32bit只能存2GB的內容(32-bit builds are limited to around 2GB of data)。
b)配置好mongodb.config, 然後命令列:Mongod.exe --config /path/to/your/mongodb.config就可以了。
c) 下載pymongo, 後面用python來寫測試程式。
請參閱:The Little MongoDB Book, (pdf)。
2 MapReduce
Map/reduce in MongoDB is useful for batch processing of data and aggregation operations. It is similar in spirit to using something like Hadoop with all input coming from a collection and output going to a collection. Often, in a situation where you would have used GROUP BY in SQL, map/reduce is the right tool in MongoDB.
參見MongoDB網站上對MapReduce的介紹。Map/reduce 流程如下:
3 例子
以單詞統計為例說明。輸入文本是Obama的演講詞,可以看看裡面裡面單詞的使用頻率。如:
MongoDB 運行用戶端用JS指令碼。
Map程式為:
Reduce程式為:
用戶端程式為:
from pymongo import Connection
from pymongo.code import Code
#'''
#Open a connection to MongoDb (localhost)
connection = Connection()
db = connection.test
#Remove any existing data
db.texts.remove()
#Insert the data
lines = open('2009-obama.txt').readlines()
[db.texts.insert({'text': line}) for line in lines]
#Load map and reduce functions
map = Code(open('wordMap.js','r').read())
reduce = Code(open('wordReduce.js','r').read())
#Run the map-reduce query
results = db.texts.map_reduce(map, reduce, "collection_name")
#Print the results
for result in results.find():
print result['_id'] , result['value']['count']
運行結果為:
文章代碼可以在這裡下載。
參見:MapReduce with MongoDB and Python 以及這裡。