Notes on using MongoDB's mapreduce Function

Source: Internet
Author: User
Tags emit

Mapreduce is a programming model used for parallel operations on large-scale datasets (larger than 1 Tb. Concepts such as map and reduce are borrowed from functional programming languages, there are also features borrowed from Vector programming languages.

1. Let's take a look at a simple example and use the mapreduce function of MongoDB for grouping statistics.

Data Table Structure, user behavior record table, each user behavior record, using mapreduce to count the total number of all actions of each user.

{"User_id": numberlong (10027857), "action_type": 9, "create_time": numberlong ("1308330304520 ")}
{"User_id": numberlong (10027858), "action_type": 7, "create_time": numberlong ("1308330556146 ")}
{"User_id": numberlong (10027859), "action_type": 5, "create_time": numberlong ("1308330834340 ")}
{"User_id": numberlong (10027859), "action_type": 8, "create_time": numberlong ("1308330896718 ")}
{"User_id": numberlong (22937), "action_type": 9, "create_time": numberlong ("1308332535982 ")}
{"User_id": numberlong (22937), "action_type": 8, "create_time": numberlong ("1308332563006 ")}

First define the map function:
M = function (){

Emit (this. user_id, {count: 1}); // count indicates that each record is traversed, and the added value indicates that each record is traversed, count and 1 are traversed, and user_id indicates key.

};

Then define the reduce function:
R = function (Key, values ){
VaR result = {count: 0 };
Values. foreach (function (value ){
Result. Count + = value. count;
});
Return result;
};

Execute res = dB. recordmodel. mapreduce (M, R, {out: {Replace: 'Things _ reduce '}});

The result is displayed in the things_reduce temporary table,

Finally, run dB. things_performance.find (); to view the result.

Execution result:

> M = function (){
... Emit (this. user_id, {count: 1 });
...};
Function (){
Emit (this. user_id, {count: 1 });
}
> R = function (Key, values ){
... Var result = {count: 0 };
... Values. foreach (function (value ){
... Result. Count + = value. count;
...});
... Return result;
...};
Function (Key, values ){
VaR result = {count: 0 };
Values. foreach (function (value) {result. Count + = value. Count ;});
Return result;
}
> Res = dB. recordmodel. mapreduce (M, R, {out: {Replace: 'Things _ reduce '}});
{
"Result": "things_reduce ",
"Timemillis": 58032,
"Counts ":{
"Inputs": 575113,
"Emit": 575113,
"Output": 19647
},
"OK": 1,
}
> DB. things_performance.find ();
{& Quot; _ id & quot;: numberlong (-10050025), & quot; Value & quot;: {& quot; count & quot;: 4 }}
{& Quot; _ id & quot;: numberlong (1), & quot; Value & quot;: {& quot; count & quot;: 15556 }}
{& Quot; _ id & quot;: numberlong (3), & quot; Value & quot;: {& quot; count & quot;: 178 }}
{& Quot; _ id & quot;: numberlong (4), & quot; Value & quot;: {& quot; count & quot;: 1649 }}
{& Quot; _ id & quot;: numberlong (5), & quot; Value & quot;: {& quot; count & quot;: 422 }}
{& Quot; _ id & quot;: numberlong (7), & quot; Value & quot;: {& quot; count & quot;: 627 }}
{& Quot; _ id & quot;: numberlong (9), & quot; Value & quot;: {& quot; count & quot;: 125 }}
{& Quot; _ id & quot;: numberlong (10), & quot; Value & quot;: {& quot; count & quot;: 871 }}
{"_ Id": numberlong (72), "value": {"count": 12 }}
{& Quot; _ id & quot;: numberlong (1031), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1032), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1033), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1034), & quot; Value & quot;: {& quot; count & quot;: 2 }}
{& Quot; _ id & quot;: numberlong (1035), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1038), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1039), & quot; Value & quot;: {& quot; count & quot;: 2 }}
{& Quot; _ id & quot;: numberlong (1041), & quot; Value & quot;: {& quot; count & quot;: 19 }}
{& Quot; _ id & quot;: numberlong (1043), & quot; Value & quot;: {& quot; count & quot;: 3 }}
{& Quot; _ id & quot;: numberlong (1044), & quot; Value & quot;: {& quot; count & quot;: 2 }}
Has more

Bytes ---------------------------------------------------------------------------------------------------------------

The preceding figure shows the statistics of all data. You can collect statistics of some data. For example, the following figure shows how to collect statistics of the data in the last three days:

Res = dB. recordmodel. mapreduce (M, R, {out: {Replace: 'Things _ reduce' },{ query: {"create_time" :{$ GT: 1308332565762 }}});

If you want to sort the results and finally query the temporary table, you can add sort.

DB. things_performance.find (). Sort ({"value. Count":-1 });


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.