MongDB MapReduce is equivalent to MySQL's groupby, so it is easy to use MapReduce for parallel statistics on MongoDB. MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. Map letter
MongDB MapReduce is equivalent to MySQL group by, so it is easy to use Map/Reduce on MongoDB for parallel statistics. MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. Map letter
MongDB MapReduce is equivalent to MySQL's "group by", so it is easy to use Map/Reduce on MongoDB for parallel "Statistics.
MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. Map functions and Reduce functions can be implemented using JS, and a mapReduce operation can be executed using db. runCommand or MapReduce commands.
Example shell
Db. runCommand ({mapreduce:
, Map:
, Reduce:
[, Query:
] [, Sort:
] [, Limit:
] [, Out:
] [, Keeptemp:
] [, Finalize:
] [, Scope:
] [, Verbose: true]});Parameter description:
Mapreduce: target set to be operated.
Map: ing function (generate a sequence of key-value pairs as a parameter of the reduce function ).
Reduce: statistical function.
Query: Filter target records.
Sort: Sorting of target records.
Limit: limit the number of target records.
Out: stores the statistical result set. If this parameter is not specified, a temporary set is used. The set is automatically deleted after the client is disconnected ).
Keeptemp: whether to retain the temporary set.
Finalize: final processing function (sort the returned results of reduce and save them to the result set ).
Scope: Import external variables to map, reduce, and finalize.
Verbose: displays detailed time statistics.Next we prepare the data for the following example.
> db.students.insert({classid:1, age:14, name:'Tom'})> db.students.insert({classid:1, age:12, name:'Jacky'})> db.students.insert({classid:2, age:16, name:'Lily'})> db.students.insert({classid:2, age:9, name:'Tony'})> db.students.insert({classid:2, age:19, name:'Harry'})> db.students.insert({classid:2, age:13, name:'Vincent'})> db.students.insert({classid:1, age:14, name:'Bill'})> db.students.insert({classid:2, age:17, name:'Bruce'})>
Now we will show you how to count the number of students in the first and second classes.The Map function must call emit (key, value) to return to the key-value Pair and use this to access the Document to be processed.
Here this must not be forgotten !!!
> m = function() { emit(this.classid, 1) }function () {emit(this.classid, 1);}>
Value can be transmitted using JSON Object (multiple attribute values are supported ). For example:
Emit (this. classid, {count: 1 })
The parameters received by the Reduce function are similar to the Group effect. The key-value sequences returned by the Map are combined into {key, [value1, value2, value3, value...]} and passed to the reduce function.
> r = function(key, values) {... var x = 0;... values.forEach(function(v) { x += v });... return x;... }function (key, values) {var x = 0;values.forEach(function (v) {x += v;});return x;}>
The Reduce function performs the "Statistics" operation on these values, and the returned results can use JSON objects.The result is as follows:
> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res"... });{"result" : "students_res","timeMillis" : 1587,"counts" : {"input" : 8,"emit" : 8,"output" : 2},"ok" : 1}> db.students_res.find(){ "_id" : 1, "value" : 3 }{ "_id" : 2, "value" : 5 }>
MapReduce () stores the results in the students_res table.Using finalize (), we can further process the result of reduce.
> f = function(key, value) { return {classid:key, count:value}; }function (key, value) {return {classid:key, count:value};}>
Let's re-calculate it and see the returned results:> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res",... finalize:f... });{"result" : "students_res","timeMillis" : 804,"counts" : {"input" : 8,"emit" : 8,"output" : 2},"ok" : 1}> db.students_res.find(){ "_id" : 1, "value" : { "classid" : 1, "count" : 3 } }{ "_id" : 2, "value" : { "classid" : 2, "count" : 5 } }>
The column name is changed to "classid" and "count", so the list is easier to understand.We can also add more control details.
> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res",... finalize:f,... query:{age:{$lt:10}}... });{"result" : "students_res","timeMillis" : 358,"counts" : {"input" : 1,"emit" : 1,"output" : 1},"ok" : 1}> db.students_res.find();{ "_id" : 2, "value" : { "classid" : 2, "count" : 1 } }>
We can see that the data is filtered first, and only the data of age <10 is obtained, and then the statistics are performed. Therefore, there is no statistical data of Class 1.