使用mongodb 的MapReduce功能筆記

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

MapReduce是一種編程模型，用於大規模資料集（大於1TB）的並行運算。概念"Map（映射）"和"Reduce（化簡）"，和他們的主要思想，都是從函數式程式設計語言裡借來的，還有從向量程式設計語言裡借來的特性。

1. 先來一個看一個簡單的例子，利用mongodb 的MapReduce功能進行分組統計。

資料表結構，使用者的行為Record表，使用者每個行為記錄一條，利用MapReduce來統計每個使用者所有行為的總數。

{"user_id" : NumberLong(10027857), "action_type" : 9, "create_time" : NumberLong("1308330304520") }
{"user_id" : NumberLong(10027858), "action_type" : 7, "create_time" : NumberLong("1308330556146") }
{"user_id" : NumberLong(10027859), "action_type" : 5, "create_time" : NumberLong("1308330834340") }
{"user_id" : NumberLong(10027859), "action_type" : 8, "create_time" : NumberLong("1308330896718") }
{"user_id" : NumberLong(22937), "action_type" : 9, "create_time" : NumberLong("1308332535982") }
{"user_id" : NumberLong(22937), "action_type" : 8, "create_time" : NumberLong("1308332563006") }

先定義map函數：
m = function(){

emit( this.user_id, {count: 1} ); // count表示每遍曆一條記錄，增加的值，現表示遍曆一條記錄count加1，user_id表示key

};

再定義Reduce函數：
r = function(key, values) {
var result = {count: 0};
values.forEach(function(value) {
result.count += value.count;
});
return result;
};

執行 res = db.RecordModel.mapReduce(m, r, {out : {replace : 'things_reduce'}});

結果會出現在things_reduce暫存資料表中，

最後執行 db.things_reduce.find(); 來查看結果。

執行結果：

> m = function(){
... emit( this.user_id, {count: 1} );
... };
function () {
emit(this.user_id, {count:1});
}
> r = function(key, values) {
... var result = {count: 0};
... values.forEach(function(value) {
... result.count += value.count;
... });
... return result;
... };
function (key, values) {
var result = {count:0};
values.forEach(function (value) {result.count += value.count;});
return result;
}
> res = db.RecordModel.mapReduce(m, r, {out : {replace : 'things_reduce'}});
{
"result" : "things_reduce",
"timeMillis" : 58032,
"counts" : {
"input" : 575113,
"emit" : 575113,
"output" : 19647
},
"ok" : 1,
}
> db.things_reduce.find();
{ "_id" : NumberLong(-10050025), "value" : { "count" : 4 } }
{ "_id" : NumberLong(1), "value" : { "count" : 15556 } }
{ "_id" : NumberLong(3), "value" : { "count" : 178 } }
{ "_id" : NumberLong(4), "value" : { "count" : 1649 } }
{ "_id" : NumberLong(5), "value" : { "count" : 422 } }
{ "_id" : NumberLong(7), "value" : { "count" : 627 } }
{ "_id" : NumberLong(9), "value" : { "count" : 125 } }
{ "_id" : NumberLong(10), "value" : { "count" : 871 } }
{ "_id" : NumberLong(72), "value" : { "count" : 12 } }
{ "_id" : NumberLong(1031), "value" : { "count" : 1 } }
{ "_id" : NumberLong(1032), "value" : { "count" : 1 } }
{ "_id" : NumberLong(1033), "value" : { "count" : 1 } }
{ "_id" : NumberLong(1034), "value" : { "count" : 2 } }
{ "_id" : NumberLong(1035), "value" : { "count" : 1 } }
{ "_id" : NumberLong(1038), "value" : { "count" : 1 } }
{ "_id" : NumberLong(1039), "value" : { "count" : 2 } }
{ "_id" : NumberLong(1041), "value" : { "count" : 19 } }
{ "_id" : NumberLong(1043), "value" : { "count" : 3 } }
{ "_id" : NumberLong(1044), "value" : { "count" : 2 } }
has more

---------------------------------------------------------------------------------------------------------------

以上是對所有資料進行統計的，可以可以實現對部分資料進行統計，如統計最近三天的資料執行如下：

res = db.RecordModel.mapReduce(m, r, {out : {replace : 'things_reduce'}},{query:{"create_time":{$gt:1308332565762}}});

如果要對結果進行排序，最後執行查詢暫存資料表時，加上sort 就可以了。

db.things_reduce.find().sort({"value.count":-1});

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More