Document directory
Http://www.mongodb.org/display/DOCS/MapReduce.
Mapreduce is mainly used in MongoDB for Batch Data Processing and aggregation operations. It is similar to hadoop. All input comes from a combination, and all output is to a set, like group aggregation in traditional relational databases, mapreduce is a useful tool in MongoDB.
Indexes and standard queries in MongoDB depend heavily on MAP/reduce. If you have used couchdb in the past, note that couchdb is very different from MongoDB, indexes and queries in MongoDB are more like those in MySQL.
MAP/reduceIt is a command interface of MongoDB, especially used in the output operations of the set. The map and reduce functions are written in JavaScript and then executed on the server. The command format syntax is as follows:
db.runCommand( { mapreduce : <collection>, map : <mapfunction>, reduce : <reducefunction> [, query : <query filter object>] [, sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>] [, limit : <number of objects to return from collection>] [, out : <see output options below>] [, keeptemp: <true|false>] [, finalize : <finalizefunction>] [, scope : <object where fields go into javascript global scope >] [, jsMode : true] [, verbose : true] });
Map-Reduce Increment
If the data you want to process is increasing, you can use map/reduce to have obvious advantages. However, in this way, you can only see the overall results, but not the results of each execution; MAP/reduce operations mainly take the following steps:
1. First run a task, operate on the set, and output the result to a set.
2. When you have more data and run the second task, you can use the options to filter data.
3. Use the reduce output option to merge new data into a new set through the reduce function.
Output otions
"collectionName" - By default the output will by of type "replace". { replace : "collectionName" } - the output will be inserted into a collection which will atomically replace any existing collection with the same name. { merge : "collectionName" } - This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one. { reduce : "collectionName" } - If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well. { inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document.
Result object
{ [results : <document_array>,] [result : <collection_name> | {db: <db>, collection: <collection_name>},] timeMillis : <job_time>, counts : { input : <number of objects scanned>, emit : <number of times emit was called>, output : <number of items in output collection> } , ok : <1_if_ok> [, err : <errmsg_if_error>]}
Map Functions
The internal variables of the map function point to the current document object. The map function callsEmit (Key, value)A certain number of times, the data is sent to the reduce function. In most cases, emit is executed once for each document, but in some cases, emit may be executed multiple times.
Reduce Function
Execute the MAP/reduce operation. The reduce function is mainly used to collect the result data executed by emit in the map and calculate a value.
The following is an example of Map-reduce in a python MongoDB client:
#!/usr/bin env python#coding=utf-8from pymongo import Connectionconnection = Connection('localhost', 27017)db = connection.map_reduce_exampledb.things.remove({})db.things.insert({"x": 1, "tags": ["dog", "cat"]})db.things.insert({"x": 2, "tags": ["cat"]})db.things.insert({"x": 3, "tags": ["mouse", "cat", "dog"]})db.things.insert({"x": 4, "tags": []})from bson.code import Codemapfun = Code("function () {this.tags.forEach(function(z) {emit(z, 1);});}")reducefun = Code("function (key, values) {" " var total = 0;" " for (var i = 0; i < values.length; i++) {" " total += values[i];" " }" " return total;" "}")result = db.things.map_reduce(mapfun, reducefun, "myresults")for doc in result.find(): print docprint "#################################################################"result = db.things.map_reduce(mapfun, reducefun, "myresults", query={"x": {"$lt": 3}})for doc in result.find(): print docprint "#################################################################"
The execution result is as follows:
{u'_id': u'cat', u'value': 3.0}{u'_id': u'dog', u'value': 2.0}{u'_id': u'mouse', u'value': 1.0}#################################################################{u'_id': u'cat', u'value': 2.0}{u'_id': u'dog', u'value': 1.0}#################################################################