mongodb 學習筆記五 MapReduce

來源:互聯網
上載者:User

英文原文:http://www.mongodb.org/display/DOCS/MapReduce

MapReduce在mongodb中使用主要做為批處理資料和彙總操作,比較像Hadoop,所有的輸入來自一個結合,所有的輸出到一個集合,更像是傳統關聯式資料庫中的group彙總操作,mapreduce是一個很有用的工具在mongodb中。

在mongodb中索引和標準的查詢很大程度上依賴於map/reduce,如果你在過去使用過CouchDB ,注意couchdb和mongodb是很大不同的,mongodb中的索引和查詢更像是mysql中的索引與查詢。

map/reduce 是mongodb的一個命令介面,特別是用在集合的輸出操作上效果更佳,map和reduce函數通過javascript來編寫,然後在伺服器中執行,命令格式文法如下

db.runCommand( { mapreduce : <collection>,   map : <mapfunction>,   reduce : <reducefunction>   [, query : <query filter object>]   [, sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]   [, limit : <number of objects to return from collection>]   [, out : <see output options below>]   [, keeptemp: <true|false>]   [, finalize : <finalizefunction>]   [, scope : <object where fields go into javascript global scope >]   [, jsMode : true]   [, verbose : true] });

Map-reduce增量

如果你要處理的資料不斷增大,那麼你使用map/reduce有很明顯的優勢,但是這樣你只能看到總的結果,不能看到每次執行的結果;map/reduce操作主要採取以下步驟:

1. 首先運行一個任務,對集合操作,並輸出結果到一個集合。

2. 當你有更多的資料的時候,運行第二個任務,可以使用選項進行過濾資料。

3. 使用reduce output 選項,通過reduce 函數歸併新的資料到一個新的集合。

Output otions

    "collectionName" - By default the output will by of type "replace".    { replace : "collectionName" } - the output will be inserted into a collection which will atomically replace any existing collection with the same name.    { merge : "collectionName" } - This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one.    { reduce : "collectionName" } - If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well.    { inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document.
Result object
{  [results : <document_array>,]  [result : <collection_name> | {db: <db>, collection: <collection_name>},]  timeMillis : <job_time>,  counts : {       input :  <number of objects scanned>,       emit  : <number of times emit was called>,       output : <number of items in output collection>  } ,  ok : <1_if_ok>  [, err : <errmsg_if_error>]}

Map函數

map函數的內部變數指向當前文檔對象,map函數調用emit(key,value) 一定次數,把資料給reduce函數,大部分情況下,對每個文檔執行一次,但有些情況下也可能執行多次emit。

reduce函數

執行map/reduce操作,reduce函數主要用來收集map中emit執行的結果資料,並計算出一個值。


下面給出一個python的mongodb用戶端的map-reduce例子,如下:

#!/usr/bin env python#coding=utf-8from pymongo import Connectionconnection = Connection('localhost', 27017)db = connection.map_reduce_exampledb.things.remove({})db.things.insert({"x": 1, "tags": ["dog", "cat"]})db.things.insert({"x": 2, "tags": ["cat"]})db.things.insert({"x": 3, "tags": ["mouse", "cat", "dog"]})db.things.insert({"x": 4, "tags": []})from bson.code import Codemapfun = Code("function () {this.tags.forEach(function(z) {emit(z, 1);});}")reducefun = Code("function (key, values) {"               "  var total = 0;"               "  for (var i = 0; i < values.length; i++) {"               "    total += values[i];"               "  }"               "  return total;"               "}")result = db.things.map_reduce(mapfun, reducefun, "myresults")for doc in result.find():    print docprint "#################################################################"result = db.things.map_reduce(mapfun, reducefun, "myresults", query={"x": {"$lt": 3}})for doc in result.find():    print docprint "#################################################################"
執行結果如下:

{u'_id': u'cat', u'value': 3.0}{u'_id': u'dog', u'value': 2.0}{u'_id': u'mouse', u'value': 1.0}#################################################################{u'_id': u'cat', u'value': 2.0}{u'_id': u'dog', u'value': 1.0}#################################################################




相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.