MongoDB Learning notes Five MapReduce

Source: Internet
Author: User
Tags emit mongodb client couchdb

Original English: Http://www.mongodb.org/display/DOCS/MapReduce

MapReduce is used primarily as batch data and aggregation operations in MongoDB, more like Hadoop, where all inputs come from a combination, all output to a set, more like a group aggregation operation in a traditional relational database, MapReduce is a very useful tool in MongoDB.

Indexing and standard queries in MongoDB are largely dependent on map/reduce, and if you've used couchdb in the past, notice that couchdb and MongoDB are very different, and the indexes and queries in MongoDB are more like indexes and queries in MySQL.

Map/reduce is a command interface for MongoDB, especially if it works better on the output of a set, the map and reduce functions are written in JavaScript and executed in the server, and the command format syntax is as follows

Db.runcommand (
 {mapreduce: <collection>,
   map: <mapfunction>,
   reduce: <reducefunction >
   [, Query: <query filter object>]
   [, Sort: <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]
   [, limit: <number of objects to retur n from Collection>]
   [, out: <see output Options Below>]
   [, keeptemp: <true|false>]
   [, Finaliz E: <finalizefunction>]
   [, Scope: <object where fields go into JavaScript global scope]
   [, Jsmode : true]
   [, Verbose:true]
 }
);

map-reduce Increment

If the data you're dealing with is growing, then you have an obvious advantage in using Map/reduce, but you can only see the overall results and not see the results of each execution; The map/reduce operation takes the following steps:

1. Run a task first, manipulate the collection, and output the result to a collection.

2. When you have more data, run the second task, you can use the option to filter the data.

3. Use the Reduce Output option to merge new data into a new collection by using the Reduce function.

Output otions

    "CollectionName"-by default the output would by type "replace". {replace: ' CollectionName '}-the output is inserted into a collection which'll atomically replace any existing
    Collection with the same name. {merge: "CollectionName"}-this option would merge new data into the old output collection. In the "other words", if the same key exists in both the "result set" and the old collection, the new key would overwrite the old O
    Ne. {reduce: ' CollectionName '}-If documents exists for a given key in the ' result set ' and in ' old collection, then a re  Duce operation (using the specified reduce function) would be performed on the two values and the The output collection.
    If A finalize function is provided, this is run after the reduce as. {Inline:1}-with this option, no collection would be created, and the whole map-reduce operation would happen in RAM. Also, the results of the map-reduce is returned within the result Object.
 Note, this, option is possible the "only" result set fits within the 16MB limit to a single document.
Result Object
{
  [results: <document_array>,]
  [Result: <collection_name> | {db: <db>, collection: <collection_name>},]
  timemillis: <job_time>,
  counts: {
       input: C6/><number of objects Scanned>,
       emit  : <number of times emit is called>,
       output: <number of Items in output collection>
  },
  OK: <1_if_ok>
  [, err: <errmsg_if_error>]
}

map Function

The internal variable of the map function points to the current document object, and the map function calls emit (key,value) for a certain number of times, giving the data to the reduce function, which in most cases executes once for each document, but in some cases it may also perform multiple emit.

Reduce function

To perform the map/reduce operation, the reduce function is used primarily to collect the result data that is executed by emit in the map and to compute a value.


Here's a map-reduce example of a Python MongoDB client, as follows:

#!/usr/bin env python #coding =utf-8 from Pymongo import Connection Connection = Connection (' localhost ', 27017) db = Connec Tion.map_reduce_example db.things.remove ({}) Db.things.insert ({"X": 1, "tags": ["Dog", "Cat"]}) Db.things.insert ({"X"
: 2, "tags": ["Cat"]}) Db.things.insert ({"X": 3, "tags": ["Mouse", "Cat", "Dog"]}) Db.things.insert ({"X": 4, "tags": []}) From Bson.code Import Code Mapfun = Code ("function () {This.tags.forEach (function (z) {emit (z, 1);});") Reducefun = Code ("  function (key, values) {"" var total = 0; ""  for (var i = 0; i < values.length. i++) {"" Total + = Values[i]; ""
               "" Return to Total; " '} ' result = Db.things.map_reduce (Mapfun, Reducefun, ' Myresults ') for doc in Result.find (): Print doc print "######## ######################################################### "result = Db.things.map_reduce (Mapfun, Reducefun," Myresults ", query={" x ": {" $lt ": 3}}) for Doc in ResULt.find (): Print doc print "#################################################################"

 
The results of the implementation are as follows:

{u ' _id ': U ' cat ', U ' value ': 3.0}
{u ' _id ': U ' dog ', U ' value ': 2.0}
{u ' _id ': U ' mouse ', U ' value ': 1.0}
#################################################################
{u ' _id ': U ' cat ', U ' value ': 2.0}
{u ' _id ' : U ' dog ', U ' value ': 1.0}
#################################################################




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.