mongodb 學習筆記五 MapReduce

最後更新：2018-12-04 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

文章目錄

Result object

英文原文：http://www.mongodb.org/display/DOCS/MapReduce

MapReduce在mongodb中使用主要做為批處理資料和彙總操作，比較像Hadoop，所有的輸入來自一個結合，所有的輸出到一個集合，更像是傳統關聯式資料庫中的group彙總操作，mapreduce是一個很有用的工具在mongodb中。

在mongodb中索引和標準的查詢很大程度上依賴於map/reduce，如果你在過去使用過CouchDB ，注意couchdb和mongodb是很大不同的，mongodb中的索引和查詢更像是mysql中的索引與查詢。

map/reduce 是mongodb的一個命令介面，特別是用在集合的輸出操作上效果更佳，map和reduce函數通過javascript來編寫，然後在伺服器中執行，命令格式文法如下

db.runCommand( { mapreduce : <collection>,   map : <mapfunction>,   reduce : <reducefunction>   [, query : <query filter object>]   [, sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]   [, limit : <number of objects to return from collection>]   [, out : <see output options below>]   [, keeptemp: <true|false>]   [, finalize : <finalizefunction>]   [, scope : <object where fields go into javascript global scope >]   [, jsMode : true]   [, verbose : true] });

Map-reduce增量

如果你要處理的資料不斷增大，那麼你使用map/reduce有很明顯的優勢，但是這樣你只能看到總的結果，不能看到每次執行的結果；map/reduce操作主要採取以下步驟：

1. 首先運行一個任務，對集合操作，並輸出結果到一個集合。

2. 當你有更多的資料的時候，運行第二個任務，可以使用選項進行過濾資料。

3. 使用reduce output 選項，通過reduce 函數歸併新的資料到一個新的集合。

Output otions

    "collectionName" - By default the output will by of type "replace".    { replace : "collectionName" } - the output will be inserted into a collection which will atomically replace any existing collection with the same name.    { merge : "collectionName" } - This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one.    { reduce : "collectionName" } - If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well.    { inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document.

Result object

{  [results : <document_array>,]  [result : <collection_name> | {db: <db>, collection: <collection_name>},]  timeMillis : <job_time>,  counts : {       input :  <number of objects scanned>,       emit  : <number of times emit was called>,       output : <number of items in output collection>  } ,  ok : <1_if_ok>  [, err : <errmsg_if_error>]}

Map函數

map函數的內部變數指向當前文檔對象，map函數調用emit(key,value) 一定次數，把資料給reduce函數，大部分情況下，對每個文檔執行一次，但有些情況下也可能執行多次emit。

reduce函數

執行map/reduce操作，reduce函數主要用來收集map中emit執行的結果資料，並計算出一個值。

下面給出一個python的mongodb用戶端的map-reduce例子，如下：

#!/usr/bin env python#coding=utf-8from pymongo import Connectionconnection = Connection('localhost', 27017)db = connection.map_reduce_exampledb.things.remove({})db.things.insert({"x": 1, "tags": ["dog", "cat"]})db.things.insert({"x": 2, "tags": ["cat"]})db.things.insert({"x": 3, "tags": ["mouse", "cat", "dog"]})db.things.insert({"x": 4, "tags": []})from bson.code import Codemapfun = Code("function () {this.tags.forEach(function(z) {emit(z, 1);});}")reducefun = Code("function (key, values) {"               "  var total = 0;"               "  for (var i = 0; i < values.length; i++) {"               "    total += values[i];"               "  }"               "  return total;"               "}")result = db.things.map_reduce(mapfun, reducefun, "myresults")for doc in result.find():    print docprint "#################################################################"result = db.things.map_reduce(mapfun, reducefun, "myresults", query={"x": {"$lt": 3}})for doc in result.find():    print docprint "#################################################################"

執行結果如下：

{u'_id': u'cat', u'value': 3.0}{u'_id': u'dog', u'value': 2.0}{u'_id': u'mouse', u'value': 1.0}#################################################################{u'_id': u'cat', u'value': 2.0}{u'_id': u'dog', u'value': 1.0}#################################################################

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More