MongoDB database operations (5)-MapReduce (groupBy)

Source: Internet
Author: User
1. MongoDB MapReduce is equivalent to Mysql's groupby, so it is easy to use MapReduce for parallel statistics on MongoDB. MapReduce is used to implement two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function.

1. MongoDB MapReduce is equivalent to group by in Mysql, so it is easy to use Map/Reduce on MongoDB for parallel statistics. MapReduce is used to implement two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function.

1. MongoDB MapReduce is equivalent to "group by" in Mysql, so it is easy to use Map/Reduce on MongoDB for parallel "Statistics. MapReduce implements two functions: Map function and Reduce function. Map function calls emit (key, value), traverses all records in the collection, and passes the key and value to the Reduce function for processing. The Map function and Reduce function can be implemented using JavaScript, and a mapReduce operation can be executed using db. runCommand or MapReduce commands.


2. Run the MapReduce Program (runCommand)

db.runCommand({
Mapreduce:
 
  
, Map:
  
   
, Reduce:
   
    
[, Query:
    
     
] [, Sort:
     
      
] [, Limit:
      
        ] [, Out:
       
         ] [, Keeptemp:
        
          ] [, Finalize:
         
           ] [, Scope:
          ] [, Verbose: true]});Parameter description:

Mapreduce: target set to be operated.
Map: ing function (generate a sequence of key-value pairs as a parameter of the reduce function ).
Reduce: statistical function.
Query: Filter target records.
Sort: Sorting of target records.
Limit: limit the number of target records.
Out: stores the statistical result set. If this parameter is not specified, a temporary set is used. The set is automatically deleted after the client is disconnected ).
Keeptemp: whether to retain the temporary set.
Finalize: final processing function (sort the returned results of reduce and save them to the result set ).
Scope: Import external variables to map, reduce, and finalize.
Verbose: displays detailed time statistics.

3. Map

Test data:

> db.students.insert({classid:1, age:14, name:'Tom'})> db.students.insert({classid:1, age:12, name:'Jacky'})> db.students.insert({classid:2, age:16, name:'Lily'})> db.students.insert({classid:2, age:9, name:'Tony'})> db.students.insert({classid:2, age:19, name:'Harry'})> db.students.insert({classid:2, age:13, name:'Vincent'})> db.students.insert({classid:1, age:14, name:'Bill'})> db.students.insert({classid:2, age:17, name:'Bruce'})

Map function: You must call emit (key, value) to return to the key-value Pair and use this to access the Document to be processed. Perform the groupby operation using the key value you provided. The following example uses classid to group data. In addition, values can be transmitted using JSON objects (multiple attribute values are supported ). Example: emit (this. classid, {count: 1 })

m = function() { emit(this.classid, 1) }

4. Reduce

The parameters received by the Reduce function are similar to the Group effect. The key-value sequences returned by the Map are combined into {key, [value1, value2, value3, value...]} and passed to the reduce function. The Reduce function performs the "Statistics" operation on these values, and the returned results can use JSON Object

r = function(key, values) {... var x = 0;... values.forEach(function(v) { x += v });... return x;... }


5. Run
res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res"... });
{"result" : "students_res","timeMillis" : 1587,"counts" : {"input" : 8,"emit" : 8,"output" : 2},"ok" : 1}> db.students_res.find(){ "_id" : 1, "value" : 3 }{ "_id" : 2, "value" : 5 }

6. Further processing results

Using finalize (), we can further process the result of reduce. Function input is the Classification key and the result value after statistics.

f = function(key, value) { return {classid:key, count:value}; }
> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res",... finalize:f... });{"result" : "students_res","timeMillis" : 804,"counts" : {"input" : 8,"emit" : 8,"output" : 2},"ok" : 1}> db.students_res.find(){ "_id" : 1, "value" : { "classid" : 1, "count" : 3 } }{ "_id" : 2, "value" : { "classid" : 2, "count" : 5 } }

7. Filtering and sorting options. Specific filtering options have been described above.

For example, filter by age:

> res = db.runCommand({... mapreduce:"students",... map:m,... reduce:r,... out:"students_res",... finalize:f,... query:{age:{$lt:10}}... });

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.