Mapreduce is a programming model used for parallel operations on large-scale datasets (larger than 1 Tb. Concepts such as map and reduce are borrowed from functional programming languages, there are also features borrowed from Vector programming languages.
1. Let's take a look at a simple example and use the mapreduce function of MongoDB for grouping statistics.
Data Table Structure, user behavior record table, each user behavior record, using mapreduce to count the total number of all actions of each user.
{"User_id": numberlong (10027857), "action_type": 9, "create_time": numberlong ("1308330304520 ")}
{"User_id": numberlong (10027858), "action_type": 7, "create_time": numberlong ("1308330556146 ")}
{"User_id": numberlong (10027859), "action_type": 5, "create_time": numberlong ("1308330834340 ")}
{"User_id": numberlong (10027859), "action_type": 8, "create_time": numberlong ("1308330896718 ")}
{"User_id": numberlong (22937), "action_type": 9, "create_time": numberlong ("1308332535982 ")}
{"User_id": numberlong (22937), "action_type": 8, "create_time": numberlong ("1308332563006 ")}
First define the map function:
M = function (){
Emit (this. user_id, {count: 1}); // count indicates that each record is traversed, and the added value indicates that each record is traversed, count and 1 are traversed, and user_id indicates key.
};
Then define the reduce function:
R = function (Key, values ){
VaR result = {count: 0 };
Values. foreach (function (value ){
Result. Count + = value. count;
});
Return result;
};
Execute res = dB. recordmodel. mapreduce (M, R, {out: {Replace: 'Things _ reduce '}});
The result is displayed in the things_reduce temporary table,
Finally, run dB. things_performance.find (); to view the result.
Execution result:
> M = function (){
... Emit (this. user_id, {count: 1 });
...};
Function (){
Emit (this. user_id, {count: 1 });
}
> R = function (Key, values ){
... Var result = {count: 0 };
... Values. foreach (function (value ){
... Result. Count + = value. count;
...});
... Return result;
...};
Function (Key, values ){
VaR result = {count: 0 };
Values. foreach (function (value) {result. Count + = value. Count ;});
Return result;
}
> Res = dB. recordmodel. mapreduce (M, R, {out: {Replace: 'Things _ reduce '}});
{
"Result": "things_reduce ",
"Timemillis": 58032,
"Counts ":{
"Inputs": 575113,
"Emit": 575113,
"Output": 19647
},
"OK": 1,
}
> DB. things_performance.find ();
{& Quot; _ id & quot;: numberlong (-10050025), & quot; Value & quot;: {& quot; count & quot;: 4 }}
{& Quot; _ id & quot;: numberlong (1), & quot; Value & quot;: {& quot; count & quot;: 15556 }}
{& Quot; _ id & quot;: numberlong (3), & quot; Value & quot;: {& quot; count & quot;: 178 }}
{& Quot; _ id & quot;: numberlong (4), & quot; Value & quot;: {& quot; count & quot;: 1649 }}
{& Quot; _ id & quot;: numberlong (5), & quot; Value & quot;: {& quot; count & quot;: 422 }}
{& Quot; _ id & quot;: numberlong (7), & quot; Value & quot;: {& quot; count & quot;: 627 }}
{& Quot; _ id & quot;: numberlong (9), & quot; Value & quot;: {& quot; count & quot;: 125 }}
{& Quot; _ id & quot;: numberlong (10), & quot; Value & quot;: {& quot; count & quot;: 871 }}
{"_ Id": numberlong (72), "value": {"count": 12 }}
{& Quot; _ id & quot;: numberlong (1031), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1032), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1033), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1034), & quot; Value & quot;: {& quot; count & quot;: 2 }}
{& Quot; _ id & quot;: numberlong (1035), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1038), & quot; Value & quot;: {& quot; count & quot;: 1 }}
{& Quot; _ id & quot;: numberlong (1039), & quot; Value & quot;: {& quot; count & quot;: 2 }}
{& Quot; _ id & quot;: numberlong (1041), & quot; Value & quot;: {& quot; count & quot;: 19 }}
{& Quot; _ id & quot;: numberlong (1043), & quot; Value & quot;: {& quot; count & quot;: 3 }}
{& Quot; _ id & quot;: numberlong (1044), & quot; Value & quot;: {& quot; count & quot;: 2 }}
Has more
Bytes ---------------------------------------------------------------------------------------------------------------
The preceding figure shows the statistics of all data. You can collect statistics of some data. For example, the following figure shows how to collect statistics of the data in the last three days:
Res = dB. recordmodel. mapreduce (M, R, {out: {Replace: 'Things _ reduce' },{ query: {"create_time" :{$ GT: 1308332565762 }}});
If you want to sort the results and finally query the temporary table, you can add sort.
DB. things_performance.find (). Sort ({"value. Count":-1 });