MongoDB learning journey 12: MongoDBMapReduce

Source: Internet
Author: User
MongDB MapReduce is equivalent to MySQL's groupby, so it is easy to use MapReduce for parallel statistics on MongoDB. MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. Map letter

MongDB MapReduce is equivalent to MySQL group by, so it is easy to use Map/Reduce on MongoDB for parallel statistics. MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. Map letter

MongDB MapReduce is equivalent to MySQL's "group by", so it is easy to use Map/Reduce on MongoDB for parallel "Statistics.

MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. Map functions and Reduce functions can be implemented using JS, and a mapReduce operation can be executed using db. runCommand or MapReduce commands.

Example shell

Db. runCommand ({mapreduce:
 
  
, Map:
  
   
, Reduce:
   
    
[, Query:
    
     
] [, Sort:
     
      
] [, Limit:
      
        ] [, Out:
       
         ] [, Keeptemp:
        
          ] [, Finalize:
         
           ] [, Scope:
          ] [, Verbose: true]});Parameter description:
Mapreduce: target set to be operated.
Map: ing function (generate a sequence of key-value pairs as a parameter of the reduce function ).
Reduce: statistical function.
Query: Filter target records.
Sort: Sorting of target records.
Limit: limit the number of target records.
Out: stores the statistical result set. If this parameter is not specified, a temporary set is used. The set is automatically deleted after the client is disconnected ).
Keeptemp: whether to retain the temporary set.
Finalize: final processing function (sort the returned results of reduce and save them to the result set ).
Scope: Import external variables to map, reduce, and finalize.
Verbose: displays detailed time statistics.

Next we prepare the data for the following example.

> db.students.insert({classid:1, age:14, name:'Tom'})> db.students.insert({classid:1, age:12, name:'Jacky'})> db.students.insert({classid:2, age:16, name:'Lily'})> db.students.insert({classid:2, age:9, name:'Tony'})> db.students.insert({classid:2, age:19, name:'Harry'})> db.students.insert({classid:2, age:13, name:'Vincent'})> db.students.insert({classid:1, age:14, name:'Bill'})> db.students.insert({classid:2, age:17, name:'Bruce'})>
Now we will show you how to count the number of students in the first and second classes.

The Map function must call emit (key, value) to return to the key-value Pair and use this to access the Document to be processed.

Here this must not be forgotten !!!

> m = function() { emit(this.classid, 1) }function () {emit(this.classid, 1);}>
Value can be transmitted using JSON Object (multiple attribute values are supported ). For example:
Emit (this. classid, {count: 1 })
The parameters received by the Reduce function are similar to the Group effect. The key-value sequences returned by the Map are combined into {key, [value1, value2, value3, value...]} and passed to the reduce function.
> r = function(key, values) {... var x = 0;... values.forEach(function(v) { x += v });... return x;... }function (key, values) {var x = 0;values.forEach(function (v) {x += v;});return x;}>
The Reduce function performs the "Statistics" operation on these values, and the returned results can use JSON objects.

The result is as follows:

> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res"... });{"result" : "students_res","timeMillis" : 1587,"counts" : {"input" : 8,"emit" : 8,"output" : 2},"ok" : 1}> db.students_res.find(){ "_id" : 1, "value" : 3 }{ "_id" : 2, "value" : 5 }>
MapReduce () stores the results in the students_res table.

Using finalize (), we can further process the result of reduce.

> f = function(key, value) { return {classid:key, count:value}; }function (key, value) {return {classid:key, count:value};}>
Let's re-calculate it and see the returned results:
> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res",... finalize:f... });{"result" : "students_res","timeMillis" : 804,"counts" : {"input" : 8,"emit" : 8,"output" : 2},"ok" : 1}> db.students_res.find(){ "_id" : 1, "value" : { "classid" : 1, "count" : 3 } }{ "_id" : 2, "value" : { "classid" : 2, "count" : 5 } }>
The column name is changed to "classid" and "count", so the list is easier to understand.

We can also add more control details.

> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res",... finalize:f,... query:{age:{$lt:10}}... });{"result" : "students_res","timeMillis" : 358,"counts" : {"input" : 1,"emit" : 1,"output" : 1},"ok" : 1}> db.students_res.find();{ "_id" : 2, "value" : { "classid" : 2, "count" : 1 } }>
We can see that the data is filtered first, and only the data of age <10 is obtained, and then the statistics are performed. Therefore, there is no statistical data of Class 1.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.