Original English: Http://www.mongodb.org/display/DOCS/MapReduce
MapReduce is used primarily as batch data and aggregation operations in MongoDB, more like Hadoop, where all inputs come from a combination, all output to a set, more like a group aggregation operation in a traditional relational database, MapReduce is a very useful tool in MongoDB.
Indexing and standard queries in MongoDB are largely dependent on map/reduce, and if you've used couchdb in the past, notice that couchdb and MongoDB are very different, and the indexes and queries in MongoDB are more like indexes and queries in MySQL.
Map/reduce is a command interface for MongoDB, especially if it works better on the output of a set, the map and reduce functions are written in JavaScript and executed in the server, and the command format syntax is as follows
Db.runcommand (
{mapreduce: <collection>,
map: <mapfunction>,
reduce: <reducefunction >
[, Query: <query filter object>]
[, Sort: <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]
[, limit: <number of objects to retur n from Collection>]
[, out: <see output Options Below>]
[, keeptemp: <true|false>]
[, Finaliz E: <finalizefunction>]
[, Scope: <object where fields go into JavaScript global scope]
[, Jsmode : true]
[, Verbose:true]
}
);
map-reduce Increment
If the data you're dealing with is growing, then you have an obvious advantage in using Map/reduce, but you can only see the overall results and not see the results of each execution; The map/reduce operation takes the following steps:
1. Run a task first, manipulate the collection, and output the result to a collection.
2. When you have more data, run the second task, you can use the option to filter the data.
3. Use the Reduce Output option to merge new data into a new collection by using the Reduce function.
Output otions
"CollectionName"-by default the output would by type "replace". {replace: ' CollectionName '}-the output is inserted into a collection which'll atomically replace any existing
Collection with the same name. {merge: "CollectionName"}-this option would merge new data into the old output collection. In the "other words", if the same key exists in both the "result set" and the old collection, the new key would overwrite the old O
Ne. {reduce: ' CollectionName '}-If documents exists for a given key in the ' result set ' and in ' old collection, then a re Duce operation (using the specified reduce function) would be performed on the two values and the The output collection.
If A finalize function is provided, this is run after the reduce as. {Inline:1}-with this option, no collection would be created, and the whole map-reduce operation would happen in RAM. Also, the results of the map-reduce is returned within the result Object.
Note, this, option is possible the "only" result set fits within the 16MB limit to a single document.
Result Object
{
[results: <document_array>,]
[Result: <collection_name> | {db: <db>, collection: <collection_name>},]
timemillis: <job_time>,
counts: {
input: C6/><number of objects Scanned>,
emit : <number of times emit is called>,
output: <number of Items in output collection>
},
OK: <1_if_ok>
[, err: <errmsg_if_error>]
}
map Function
The internal variable of the map function points to the current document object, and the map function calls emit (key,value) for a certain number of times, giving the data to the reduce function, which in most cases executes once for each document, but in some cases it may also perform multiple emit.
Reduce function
To perform the map/reduce operation, the reduce function is used primarily to collect the result data that is executed by emit in the map and to compute a value.
Here's a map-reduce example of a Python MongoDB client, as follows:
#!/usr/bin env python #coding =utf-8 from Pymongo import Connection Connection = Connection (' localhost ', 27017) db = Connec Tion.map_reduce_example db.things.remove ({}) Db.things.insert ({"X": 1, "tags": ["Dog", "Cat"]}) Db.things.insert ({"X"
: 2, "tags": ["Cat"]}) Db.things.insert ({"X": 3, "tags": ["Mouse", "Cat", "Dog"]}) Db.things.insert ({"X": 4, "tags": []}) From Bson.code Import Code Mapfun = Code ("function () {This.tags.forEach (function (z) {emit (z, 1);});") Reducefun = Code (" function (key, values) {"" var total = 0; "" for (var i = 0; i < values.length. i++) {"" Total + = Values[i]; ""
"" Return to Total; " '} ' result = Db.things.map_reduce (Mapfun, Reducefun, ' Myresults ') for doc in Result.find (): Print doc print "######## ######################################################### "result = Db.things.map_reduce (Mapfun, Reducefun," Myresults ", query={" x ": {" $lt ": 3}}) for Doc in ResULt.find (): Print doc print "#################################################################"
The results of the implementation are as follows:
{u ' _id ': U ' cat ', U ' value ': 3.0}
{u ' _id ': U ' dog ', U ' value ': 2.0}
{u ' _id ': U ' mouse ', U ' value ': 1.0}
#################################################################
{u ' _id ': U ' cat ', U ' value ': 2.0}
{u ' _id ' : U ' dog ', U ' value ': 1.0}
#################################################################