MongoDB study notes 5 mapreduce

Source: Internet
Author: User
Tags emit mongodb client couchdb
Document directory
  • Result object

Http://www.mongodb.org/display/DOCS/MapReduce.

Mapreduce is mainly used in MongoDB for Batch Data Processing and aggregation operations. It is similar to hadoop. All input comes from a combination, and all output is to a set, like group aggregation in traditional relational databases, mapreduce is a useful tool in MongoDB.

Indexes and standard queries in MongoDB depend heavily on MAP/reduce. If you have used couchdb in the past, note that couchdb is very different from MongoDB, indexes and queries in MongoDB are more like those in MySQL.

MAP/reduceIt is a command interface of MongoDB, especially used in the output operations of the set. The map and reduce functions are written in JavaScript and then executed on the server. The command format syntax is as follows:

db.runCommand( { mapreduce : <collection>,   map : <mapfunction>,   reduce : <reducefunction>   [, query : <query filter object>]   [, sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]   [, limit : <number of objects to return from collection>]   [, out : <see output options below>]   [, keeptemp: <true|false>]   [, finalize : <finalizefunction>]   [, scope : <object where fields go into javascript global scope >]   [, jsMode : true]   [, verbose : true] });

Map-Reduce Increment

If the data you want to process is increasing, you can use map/reduce to have obvious advantages. However, in this way, you can only see the overall results, but not the results of each execution; MAP/reduce operations mainly take the following steps:

1. First run a task, operate on the set, and output the result to a set.

2. When you have more data and run the second task, you can use the options to filter data.

3. Use the reduce output option to merge new data into a new set through the reduce function.

Output otions

    "collectionName" - By default the output will by of type "replace".    { replace : "collectionName" } - the output will be inserted into a collection which will atomically replace any existing collection with the same name.    { merge : "collectionName" } - This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one.    { reduce : "collectionName" } - If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well.    { inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document.
Result object
{  [results : <document_array>,]  [result : <collection_name> | {db: <db>, collection: <collection_name>},]  timeMillis : <job_time>,  counts : {       input :  <number of objects scanned>,       emit  : <number of times emit was called>,       output : <number of items in output collection>  } ,  ok : <1_if_ok>  [, err : <errmsg_if_error>]}

Map Functions

The internal variables of the map function point to the current document object. The map function callsEmit (Key, value)A certain number of times, the data is sent to the reduce function. In most cases, emit is executed once for each document, but in some cases, emit may be executed multiple times.

Reduce Function

Execute the MAP/reduce operation. The reduce function is mainly used to collect the result data executed by emit in the map and calculate a value.

The following is an example of Map-reduce in a python MongoDB client:

#!/usr/bin env python#coding=utf-8from pymongo import Connectionconnection = Connection('localhost', 27017)db = connection.map_reduce_exampledb.things.remove({})db.things.insert({"x": 1, "tags": ["dog", "cat"]})db.things.insert({"x": 2, "tags": ["cat"]})db.things.insert({"x": 3, "tags": ["mouse", "cat", "dog"]})db.things.insert({"x": 4, "tags": []})from bson.code import Codemapfun = Code("function () {this.tags.forEach(function(z) {emit(z, 1);});}")reducefun = Code("function (key, values) {"               "  var total = 0;"               "  for (var i = 0; i < values.length; i++) {"               "    total += values[i];"               "  }"               "  return total;"               "}")result = db.things.map_reduce(mapfun, reducefun, "myresults")for doc in result.find():    print docprint "#################################################################"result = db.things.map_reduce(mapfun, reducefun, "myresults", query={"x": {"$lt": 3}})for doc in result.find():    print docprint "#################################################################"

The execution result is as follows:

{u'_id': u'cat', u'value': 3.0}{u'_id': u'dog', u'value': 2.0}{u'_id': u'mouse', u'value': 1.0}#################################################################{u'_id': u'cat', u'value': 2.0}{u'_id': u'dog', u'value': 1.0}#################################################################

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.