MongoDB Aggregation, mongoDB的彙總操作

最後更新：2018-07-26 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

一直認為mongoDB中的Aggregation就是彙總管道，今天看了官網的介紹才有了更多的瞭解。

彙總的作用：將多條記錄放在一起，然後通過多種操作擷取單一的結果。網上有很多描述，可以參考。

彙總方法：MongoDB提供了彙總管道，map-reduce function, single purpose aggregation methods 三種彙總方式。

彙總管道有很多介紹，不在贅述。

map-reduce function:

count,group等能做的事情，MapReduce都能做。它可以輕鬆並行化到多個伺服器，它會拆分問題，再將各個部分發送到不同的機器上，讓每台機器都完成一部分。當所有機器都完成的時候，再把結果彙集起來形成最終的完整結果。

MapReduce的步驟：1.映射，將操作映射到集合中的每個文檔，這個操作要麼"無作為"，要麼“產生一些鍵和X個值”。然後就是中間環節，稱作洗牌(shuffle),按照鍵進行分組，並將產生的索引值組成列表放到對應的鍵中。化簡(reduce),則把列表中的值化簡成一個簡單的值。這個值被返回，然後接著洗牌。直到每個鍵的列表只有一個值為止。這個值也就是最後結果。

使用MapReduce的代價是速度：group不是很快，MapReduce更慢，絕對不要用在"即時"環境中。要作為背景工作來運行MapReduce,將建立一個儲存結果的集合，可以對這個集合進行即時查詢。

MongoDB提供了很多命令用於對collection進行彙總操作，當然也可以用於子集合。

db.runCommand(               {                 mapReduce: <collection>,--  collection的名稱，collection在被map function 處理之前會使用query 等過濾。                 map: <function>,----JavaScript函數，或者通過key 映射一個值，或者使用key 和value一對值emit. 見注釋1.                 reduce: <function>,----javaScript函數，將所有值簡化成一個索引值對應的對象，使用方法見注釋2.                 finalize: <function>,----將結果通過key ,value的形式返回。                 out: <output>,----注釋三                 query: <document>,                 sort: <document>,                 limit: <number>,                 scope: <document>,                 jsMode: <boolean>,                 verbose: <boolean>,                 bypassDocumentValidation: <boolean>,                 collation: <document>               }             )

注釋一： map函數用來將每個document轉換成0或者多個documents,根據定義的範圍參數擷取變數。 Map函數調用emit(key,value)遍曆集合中所有的記錄.將key與value傳給Reduce函數進行處理。

函數寫法1：擷取範圍參數中的結果

function() {

   ...   emit(key, value);}

* 文檔代表的是當前文檔

* 任何情況下都不要試圖擷取database

* 函數不能再使用另一個外部函數

*一個單獨的emit最多隻能佔用BSON document Size的一半大小，3.4版本是16M，因此不能超過8M。

函數方法二：

下面的map 將在document的status 索引值符合要求時，調用emit函數一次或者0次。

function() {    if (this.status == 'A')        emit(this.cust_id, 1);}

下面的map將根據document的item 索引值的數目，多次調用emit。

function() {    this.items.forEach(function(item){ emit(item.sku, 1); });}

注釋二：

函數格式：

function(key, values) {   ...   return result;}

* reduce 函數不可以擷取database

* 不能影響外部系統

* 當索引值只有一個值的時候，不會調用reduce函數。必須是數組。

* 可以為同一個索引值多次調用該函數

* 該函數可以訪問範圍中定義的所有變數

* reduce的內容必須小於BSON max SIZE的一半，即小於8M（3.4版本）。

注釋三：

* 結果儲存在一個新的collection

out: <collectionName>

* 當使用一個已經存在的collection,It is not available on secondary members of replica sets.

out: { <action>: <collectionName>        [, db: <dbName>]        [, sharded: <boolean> ]        [, nonAtomic: <boolean> ] }

其中action可以使用下面的其中一個：

replace: 覆蓋原來的內容

merge: 將輸出的內容和原有的內容合并，如果有相同的key值，則內容覆蓋。

reduce: 將輸出的內容和原有的內容合并，如果有相同的key值，使用reduce function 將新的document和舊document進行計算，儲存計算結果。

db---

可選，資料庫名稱，預設使用和待處理的collection相同的資料庫。

sharded----分區

可選，如果為true,並且資料庫的分區是開啟的，則輸出將使用_id作為shard key進行分區操作。

nonAtomic---非原子

可選，僅在merge和reduce action中可以使用，預設值為false.

當為false時， map reduce 操作期間會鎖定資料庫。

當為true是，map reduce操作期間其他用戶端可以讀取output collection.

* Output Inline---map reduce操作在內容中進行，並返回結果。 This option is the only available option forout on secondary members of replica sets.

out: { inline: 1 }

範圍結果大小不得超過BSON max Size 16M.

Map-Reduce舉例：

db.collection.mapReduce() 集合了mapReduce的命令，

假如collection 中存的記錄如下：

{     _id: ObjectId("50a8240b927d5d8b5891743c"),     cust_id: "abc123",     ord_date: new Date("Oct 04, 2012"),     status: 'A',     price: 25,     items: [ { sku: "mmm", qty: 5, price: 2.5 },              { sku: "nnn", qty: 5, price: 2.5 } ]}

例子可參考官網： https://docs.mongodb.com/manual/reference/command/mapReduce/#mapreduce-reduce-cmd

Output

The mapReduce command adds support for the bypassDocumentValidation option, which lets you bypass document validation when inserting or updating documents in a collection with validation rules.

If you set the out parameter to write the results to a collection, the mapReduce command returns a document in the following form:

{    "result" : <string or document>,    "timeMillis" : <int>,    "counts" : {        "input" : <int>,        "emit" : <int>,        "reduce" : <int>,        "output" : <int>    },    "ok" : <int>,}

If you set the out parameter to output the results inline, the mapReduce command returns a document in the following form:

{    "results" : [       {          "_id" : <key>,          "value" :<reduced or finalizedValue for key>       },       ...    ],    "timeMillis" : <int>,    "counts" : {       "input" : <int>,       "emit" : <int>,       "reduce" : <int>,       "output" : <int>    },    "ok" : <int>}

mapReduce. result

For output sent to a collection, this value is either: a string for the collection name if out did not specify the database name, or a document with both db and collection fields if out specified both a database and collection name. mapReduce. results

For output written inline, an array of resulting documents. Each resulting document contains two fields: _id field contains the key value, value field contains the reduced or finalized value for the associated key. mapReduce. timeMillis

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More