MongoDB之Map-Reduce — Mongo Shell版和C#版（上）

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

最近有在學習MongoDB，看到了關於Map-Reduce，覺得蠻有意思的，所以在這裡就記錄下來作為學習筆記。

關於Map-Reduce的作用這裡就引用一下官網以及另外一篇文章看到的，言簡意賅。

1. 官網：http://docs.mongodb.org/manual/tutorial/map-reduce-examples/

The map-reduce operation is composed of many tasks, including:

reads from the input collection,
executions of the map function,
executions of the reduce function,
writes to the output collection.

2.另一篇文章：http://openmymind.net/2011/1/20/Understanding-Map-Reduce/

So what advantage does map reduce hold? The oft-cited benefit is that both the map and reduce operations can be distributed. So the code I've written above could be executed by multiple threads, multiple cpus, or even thousands of servers as-is. This is
key when dealing with millions and billions of records, or smaller sets with more complex logic. For the rest of us though, I think the real benefit is the power of being able to write these types of transforms using actual programming languages, with variables,
conditional statements, methods and so on. It is a mind shift from the traditional approach, but I do think even slightly complex queries are cleaner and easier to write with map reduce. We didn't look at it here, but you'll commonly feed the output of a reduce
function into another reduce function - each function further transforming it towards the end-result.

好的，官網上說的是map-reduce在執行的時候包括哪些操作，步驟；另一篇文章說得是map-reduce有什麼好處相對於傳統的group by之類操作，而且還有他自己的見解，大家可以看看這篇文章。

接下來我就舉個例子，加深對map-reduce的理解，考慮到我們有這樣的一個表（說成表相對於傳統資料庫更好理解，NoSql裡面稱之為Collection），裡面有欄位_id，cusid，price（NoSql裡面稱儲存欄位的每條記錄為Document），裡面有資料如下：

Input：

=========================

{      cusid:1,    price:15      };
{      cusid:2,    price:30      };
{      cusid:2,    price:45      };
{      cusid:3,    price:45      };
{      cusid:4,    price:5        };
{      cusid:5,    price:65      };
{      cusid:1,    price:10      };
{      cusid:1,    price:30      };
{      cusid:5,    price:30      };
{      cusid:4,    price:100    };

=========================

但是我們想要得到的資料是根據cusid統計price的總和，這個可以利用group by來實現，但是前面的2個引用說了map-reduce的優勢，尤其是大資料的時候，優勢會很明顯，那麼我們就用map-reduce來實現，輸出資料如下：

Output：

=========================

{      cusid:1,    price:55         };
{      cusid:2,    price:75         };
{      cusid:3,    price:45         };
{      cusid:4,    price:105      };
{      cusid:5,    price:95        };

=========================

基本的要求介紹完了，下面我們就一一實現，首先是

一. Mongo Shell 版本：

1. 首先我們編寫map function來處理每一個Document（其實就是編寫js指令碼，但是又不同）。

紅色方框裡面的就是，上面的是我插入的記錄資料，應該注意到了裡面有一個emit函數，其作用就是做一個key-value匹配，這裡就是將每一個Document的price匹配到對應的cusid中，很容易理解，是的。

2. 編寫對應的reduce function，這裡的functio有2個參數，key-values，對，不是key-value，values是一個數組，這裡相當於做了一個group操作，全部對應一個cusid，cusid-prices。

reduceFunction主要的操作就是對每一個不同的cusid的price做求和，得到結果。

3. 執行map-reduce操作，並輸出結果到一個臨時的Collection中去。

紅色部分即為我們要的結果，與Output結果一致。這裡有一點疑問，就是cusid = 3的時候，結果的格式與其他的不一樣，猜測可能是因為當cusid = 3的記錄只有一條，所以就不會做類似group的操作，簡言之就不會執行reduceFunction了，如果想要驗證這個猜測，我們可以在插入一條cusid = 3的記錄，看看結果是否會變化。

事實證明猜測是正確的，cusid = 3的結果和其他的一致了。~_~

未完待續，敬請期待！

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More