MongoDB Aggregation, MongoDB aggregation operations

Source: Internet
Author: User
Tags emit mongodb shuffle

Always think that the aggregation in MongoDB is the aggregation pipeline, today saw the introduction of the official website to have more understanding.


The role of aggregation: Put multiple records together, and then get a single result through a variety of operations. There are many descriptions on the internet, can be consulted.

Aggregation method: MongoDB provides aggregation pipeline, Map-reduce function, single purpose aggregation methods three kinds of aggregation methods.

The aggregation pipeline has many introductions, not to repeat.


map-reduce function:


MapReduce can do things that count,group can do. It can be easily parallelized to multiple servers, it will split the problem, and then send the parts to different machines, so that each machine is part of the complete. When all the machines are complete, the results are assembled together to produce the final complete result.

MapReduce steps: 1. Map, map the operation to each document in the collection, which either "does nothing" or "produces some keys and x values". Then is the intermediate link, called Shuffle (Shuffle), grouped by key, and the resulting list of key values into the corresponding key. Simplification (reduce), the values in the list are reduced to a simple value. This value is returned and then shuffled. Until the list of each key has only one value. This value is also the final result.

The cost of using MapReduce is speed: Group is not fast, mapreduce is slower, never used in a "live" environment. To run MapReduce as a background task, a collection of saved results is created, and the collection can be queried in real time.


MongoDB provides a number of commands for aggregating collection and, of course, for subcollections.

Db.runcommand ({mapReduce: <collection>,--Name of collection, collection in the map function
                 Filters such as query are used before processing. Map: <function>,----JavaScript functions, or map a value by key, or use key and value for a pair of values emit.
                 See note 1.
                 Reduce: <function>,----JavaScript functions, simplifying all values to a corresponding object of a key value, as described in Note 2.
                 Finalize: <function>,----Returns the result in the form of key, value.
                 Out: <output>,----Note three query: <document>, sort: <document>,
                 Limit: <number>, scope: <document>, Jsmode: <boolean>, Verbose: <boolean>, bypassdocumentvalidation: <boolean>, collation: <document>})
Note One: The map function is used to convert each document to 0 or more documents, and gets the variables according to the defined scope parameters. The map function calls emit (Key,value) to iterate through all the records in the collection. Pass key and value to the reduce function for processing.

Function 1: Get the result in the scope parameter

function () {
   ...
   Emit (key, value);
}
* The document represents the current document

* Do not attempt to acquire database under any circumstances

* Function can no longer use another external function

* A single emit can take up to half the size of Bson document size, the 3.4 version is 16M and therefore cannot exceed 8M.

Function Method Two:

The following map will call the emit function one or 0 times when the document's status key value meets the requirements.


function () {
    if (this.status = = ' A ')
        emit (this.cust_id, 1);
}
The following map calls emit multiple times, based on the number of document item key values.


function () {
    This.items.forEach (function (item) {Emit (Item.sku, 1);});
Note two:

function format:

function (key, values) {
   ...
   return result;
}

* Reduce function is not available for database

* Cannot affect external systems

* The reduce function is not called when the key value has only one value. Must be an array.

* This function can be called multiple times for the same key value

* This function can access all variables defined in the scope

* The content of reduce must be less than half of Bson max size, which is less than 8M (version 3.4).


Note three:

* Results stored in a new collection

Out: <collectionName>

* When using an already existing collection,it is not available on secondary members of replica sets.

Out: {<action>: <collectionName>
        [, DB: <dbname>]
        [, sharded: <boolean>]
        [, Nonatomic: <boolean>]}
Where action can use one of the following:

Replace: Overwrite the original content

Merge: Merges the contents of the output with the original content, and if there is the same key value, the content is overwritten.

Reduce: Merges the contents of the output with the original content, and if you have the same key value, use the Reduce function to calculate the new document and the old document and save the results.

DB---

Optional, database name, default use, and pending collection of the same database.

sharded----Shards

Optional, if true, and the Shard of the database is open, the output will use _id as the Shard key for the Shard operation.

Nonatomic---non-atomic

Optional, only available in the merge and reduce action, and the default value is False.

When false, the database is locked during the map reduce operation.

When True, other clients can read the output collection during the map reduce operation.

* Output Inline---The map reduce operation is performed in the content and returns the result. This option was the only available option Forout on secondary members of replica sets.

Out: {inline:1}

The range result size must not exceed Bson max sizes 16M.


Map-reduce Example:

Db.collection.mapReduce () assembled the command of the MapReduce,

If the records stored in collection are as follows:

{
     _id:objectid ("50a8240b927d5d8b5891743c"),
     cust_id: "abc123",
     ord_date:new date ("Oct,"),
     Status: ' A ',
     price:25,
     items: [{sku: "MMM", Qty:5, price:2.5},
              {sku: "nnn", Qty:5, price:2.5}]
}

Examples can be found in the official website: https://docs.mongodb.com/manual/reference/command/mapReduce/#mapreduce-reduce-cmd


Output

The mapReduce command adds the Bypassdocumentvalidation option, which lets you bypass document validation when Inserting or updating documents in a collection with validation rules.

If you set the "parameter to write" The results to a collection, the MapReduce command returns a document in the FOLLOWI Ng form:

{
    "result": <string or Document>,
    "Timemillis": <int>,
    "counts": {
        "input": <int>
        "emit": <int>,
        "reduce": <int>,
        "Output": <int>
    },
    "OK": <int>,
}

If you set the "parameter to output" of the results inline, the mapReduce command returns a document in the following form:

{"
    results": [
       {
          "_id": <key>,
          "value": <reduced or Finalizedvalue for key>
       },
       ...
    ],
    "Timemillis": <int>,
    "counts": {
       "input": <int>,
       "emit": <int>,
       "reduce": <int>,
       "Output": <int>
    },
    "OK": <int>
}
Mapreduce. Result

For output sent to a collection, this value was either:a string for the collection name if out do not specify the Databas e name, or a document with both DB and collection if out specified both a database and collection name. Mapreduce. Results

For output written-inline, an array of resulting documents. Each resulting document contains: _id field contains the key value, Value field contains the reduced or Finaliz Ed value for the associated key. Mapreduce. Timemillis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.