MongoDB's aggregation Framework aggregation framework Introductory Learning course _MONGODB

Source: Internet
Author: User
Tags emit mongodb numeric value prev shuffle

1. Aggregation Framework
using the aggregation framework to transform and combine documents in a collection, you can create a pipeline (pipeline) with multiple artifacts to process a series of documents. These artifacts include filtering (filtering), projection (projecting), grouping (grouping), sorting (sorting), restriction (limiting), skipping (skipping).
For example, a collection of animal types that would like to find the most animals, assuming that each animal is saved as a MongoDB document, you can follow these steps to create a pipeline.
1 map out the animal name of each document.
2 Install name sorting to count the number of occurrences of each name.
3 The document is sorted by the number of times the name appears.
4) Limit the return result to the first five.
The specific operator:
1) {"$porject", {"name": 1}}
Similar to the query phase of the field selector, specify "fieldname": 1 Select the required fields, "FieldName": 0 Exclude unwanted fields, the "_id" field is automatically displayed. The results are saved in memory and are not written to disk.

Db.test_collection.aggregate ({"$project": {"name": 1}});    =>
{"_id": ObjectId ("535a2d3c169097010b92fdf6"), "name": "Snake"}

2 {"$group", {"_id": "$name", "Count": {"$sum": 1}}}
The field "name" of the group is first specified, and after the operation, each name corresponds to only one result, and all can designate name as the unique identifier "_id".
The second field indicates that each document "Count" field in the group is 1 plus. There are no count fields in the newly added document.

Db.test_collection.aggregate ({"$project": {"name": 1}}, {"$group": {"_id": "$name", "Count": {"$sum": 1}}});    =>
{"_id": "Bird", "Count": 8344}
{"_id": "Snake", "Count": 8443}
{"_id": "Cat", "Count": 8183}
{"_id": "Rabbit", "Count": 8206}
{"_id": "Tiger", "Count": 8329}
{"_id": "Cow", "Count": 8309}
{"_id": "Horse", "Count": 8379}
{"_id": "Dog", "Count": 8406}
{"_id": "Dragon", "Count": 8372}
{"_id": "Elephant", "Count": 8264}
{"_id": "Pig", "Count": 8403}
{"_id": "Lion", "Count": 8362}

3) {"$sort": {"Count":-1}}
The documents in the result set are sorted in descending order according to the Count field.
4) {"$limit": 5}
Limit the return result to 5 documents.
Combine the above results:

Db.test_collection.aggregate (
{"
  $project": {"name": 1}}, 
  {"$group": {"_id": "$name", "Count": {"$sum ': 1}}}, 
  {"$sort": {"Count":-1}},
  {"$limit": 5}
);

Aggregate returns an array of documents for the 5 animals that appear most frequently:

{"_id": "Snake", "Count": 8443}
{"_id": "Dog", "Count": 8406}
{"_id": "Pig", "Count": 8403}
{"_id": "Horse", "Count": 8379}
{"_id": "Dragon", "Count": 8372}

During the debugging process. You can troubleshoot the pipe character.
The aggregation framework cannot write to the collection, all results are returned to the client, and aggregation results must be limited to 16M.

2. Pipe operator
each operator accepts a series of documents, converts them to a type, and the resulting document is passed to the next operator as the result.
Different pipe operators can be used together in any order, and can be repeated as many times as possible.

2.1 $match
$match is used to filter the collection of documents, and the resulting subset of documents is aggregated.
"$match" supports all General query operators ("$GT", "$lt", "$ne"), etc., and cannot use geospatial operators.
In practice, as far as possible put "$match" in the front of the pipeline, on the one hand can be fast to remove unwanted documents, in addition to the mapping and grouping before filtering, query can use the index.

2.2 $project
You can extract fields by using $project, and you can rename fields.

Db.foo.aggregate ({"$project": {"City": 1, "_id": 0}})    =>
{"City": "NEW WORK"}

You can rename a field that has been projected:

Db.foo.aggregate ({"$project": {"newcity": "$city", "_id": 0}})    =>
{"newcity": "NEW WORK"}

Use the $fieldname syntax to refer to the FieldName field in the aggregation framework, such as the above "$city" to be replaced with "NEW WORK".
After renaming a field, MONGDB does not record the history name of its record field, so you should use the index before modifying the field name.
2.2.1 Pipe expression
You can use an expression to combine multiple literal quantities and variables into one value.
You can create complex expressions by using a combination or nesting of any depth.
Mathematical expression of 2.2.2
a mathematical representation is used to manipulate data operations.

Db.foo.aggregate ({"$project": {"Total": {" 
      $add": ["$age", "$year"]},
      "_id": 0
    }
  }
)
{"Total": 15}

You can combine multiple expressions into more complex expressions:

Db.foo.aggregate (
  {"$project": {"sub": {"
      $subtract": [{"$add": ["$age", "$year"]}, 7]}, 
      "_id": 0
    }
  }
)
{"Sub": 8}

operator Syntax:
1) "$add": [Expr1, [, Expr2, ..., ExprN]]
Add an expression
2) "$subtract": [Expr1, EXPR2]
Expression 1 minus expression 2
3) "$multiply": [Expr1, [, Expr2, ..., ExprN]]
Multiply an expression
4) "$divide": [Expr1, EXPR2]
Expression 1 divided by expression 2 gets the quotient of
5) "$mod": [Expr1, EXPR2]
Expression 1 divided by expression 2 to get the remainder

2.2.3-Date Expression
expressions used to extract date information: "$year", "$month", "$week", "$dayOfMonth", "$dayOfweek", "$hour", "$minute", "$second". You can only date operations on fields of date types, and you cannot date operations on numeric types.

Db.bar.insert ({"Name": "Pipi", "date": New Date ()})
Db.bar.aggregate (
  {"$project": 
    {"Birth-month": 
      {"$month": "$date"},
      "_id": 0
    }
  }}
)
{"Birth-month": 4}

You can also use the literal amount date.

Db.bar.aggregate (
  {"$project": {"Up-to-now": {" 
      $subtract": [{"$minute": New Date ()}, {"$minute": "$ Date "}]},
      " _id ": 0
    }
  }
)
{" Up-to-now ": 18}

2.2.3-string expression
operator Syntax:
1) "$substr": [Expr, Startoffset, Numoreturn]
Accepts a string, the starting position is offset by n bytes, and the string is intercepted.
2) "$concat": [expr1[, Expr2, ..., ExprN]]
Joins the given expression together as the result of the return.
3) "$toLower": Expr
Returns the lowercase form of a parameter
4) "$toUpper": Expr
Returns the uppercase form of a parameter
For example:

Db.foo.insert ({"FirstName": "Caoqing", "LastName": "Lucifer"})
Db.foo.aggregate (
{
  "$project": {
    " Email ": {"
      $concat ": [
        {" $substr ": [" $firstname ", 0, 1]},
        ". ",
        " $lastname ",
        " @gmail. com "
          ]
        },
        "_id": 0
      }
  }
)
{"Email": "C.lucifer@gmail.com"}

2.2.3-Logical expression
operator Syntax:
1) "$cmp": [Expr1, EXPR2]
Compares two parameters, equality returns 0, greater than return integer, less than return negative.
2) "$strcasecmp": [String1, string2]
Compare strings, case-sensitive
3 "$eq"/"$ne"/"$GT"/"$gte"/"LT"/"LTE": [Expr1, EXPR2]
Compare string, return result (true or false)
4) "$and": [expr1[, Expr2, ..., ExprN]]
Returns true if all values are true, or false.
5) "$or": [expr1[, Expr2, ..., ExprN]]
Returns true if any expression is true, otherwise returns false
6) "$not": Expr
Inverse of the expression
There are also two control statements.

"$crond": [Booleanexpr, trueexpr, falseexpr]

If true, returns TRUEEXPR, otherwise, returns FALSEEXPR.

"$ifFull": [Expr, replacementexpr]

If expr is null, returns REPLACEMENTEXPR, otherwise expr is returned.
The arithmetic operator must accept the numeric value, the date operator must accept the date, and the string operator must accept the string.
For example, according to student attendance (10%), normal homework (30%) and test results (60%) to get the final results, if the teacher dote on students, directly to 100 points:
Insert data:

Db.bar.insert (
  {
    "name": "Xiaobao", "
    Teacherspet": 1,
    "attendance": ","
    quizz ":
    ", "test ":
  Db.bar.insert}
"
(
  {
    "name": "Caoqing",
    "Teacherspet": 0,
    "attendance": 20,< c14/> "Quizz": +,
    "test": M
  }
)
Db.bar.insert (
  {
    "name": "Pipi",
    "Teacherspet ": 0,
    " attendance ": M,
    " Quizz ": +,
    " test ": Ten
  }
)

Polymerization:

db.bar.aggregate {
  "
    $project": {"
      grade": {"
        $cond": [
          "$teachersPet",
        {
          "$add": [
            {"$multiply": [0.1, $attendance]},
            {"$multiply": [0.3, $quizz]},
            {"$multiply": [0. 6, "$test"]},
          ]
        }
        }
      , 
      "_id": 0
    }
  }
)

return Result:

{"Grade":}
{"Grade":}
{"Grade": 31}

3. MapReduce
MapReduce is very powerful and flexible, MongoDB uses JavaScript as the query language to represent arbitrary and complex logic.
MapReduce is very slow and should not be used in actual data analysis.
MapReduce can be executed in parallel across multiple servers, you can split a problem into small problems, then send small questions to different machines, each machine is responsible for completing only part of the work, and when all machines are complete, these fragmented solutions are merged into a complete solution.
The first is mapping (map), mapping the action to each document in the collection, then the intermediate link, which becomes the shuffle (shuffle), grouped by key, and the resulting list of key values is placed in the corresponding key. Reduction is to reduce the value in the list to a single value.

3.1 Find all the keys in the collection
MongoDB assumes that your schema is dynamic, so the key for each document is not tracked. The best way to usually find all the keys for all the documents in a collection is MapReduce.
In the mapping link, the Map function returns the value to be processed using a special emit function. Emit will give MapReduce a key and a value.
This returns the count of a key in a document using emit. This is the reference to the current mapped document:

Map = function () {
  emit (This.country, {count:1});
}

Reduce accepts two parameters, one is key, the first value returned by emit, and an array consisting of {count:1} documents corresponding to one or more keys.

reduce = function (key, value) {
  var result = {count:0};
  for (var i = 0; i < value.length i++) {
    Result.count + = Value[i].count;
  }
  return result;
}

Sample table Data:

{"_id": "Country": "Japan", "Money": 724}
{"_id": "Country": "Germany", "Money": 520}
{"_id": "Country": "India", "Money": 934}
{"_id": "Country": "Our", "Money": 721}
{"_id": "Country": "Germany", "Money": 156}
{"_id": "Country": "Canada", "Money": 950}
{"_id": "Country": "India", "Money": 406}
{"_id": "Country": "Japan", "Money": 776}
{"_id": "Country": "Canada", "Money": 468}
{"_id": "Country": "Germany", "Money": 262}
{"_id": "Country": "Germany", "Money": 126}
{"_id": "Country": "Japan", "Money": 86}
{"_id": "Country": "Canada", "Money": 870}
{"_id": Wuyi, "Country": "India", "Money": 98}
{"_id": "Country": "India", "Money": 673}
{"_id": "Country": "Japan", "Money": 487}
{"_id": "Country": "India", "Money": 681}
{"_id": "Country": "Canada", "Money": 491}
{"_id": "Country": "Japan", "Money": 98} {"_id": countrY ":" "", "Money": 172}
 

Run Result:

Db.foo.mapReduce (map, reduce, {out: "collection"})
{
    "result": "Collcetion",
    "Timemillis": ",
    "  Counts ": {"
        input ":",
        "emit": ","
        Reduce ": 5,
        " Output ": 5
    },
    " OK ": 1,
    " $gleStats ": {
        "lastoptime": Timestamp (1399168165),
        "Electionid": ObjectId ("535a2ce15918f42de9ab1427")
    },
}

(1) Result: The set name of the store
(2) Timemillis: The time spent in operation, the unit is milliseconds
(3) Input: Number of incoming documents
(4) Emit: The number of times this function was invoked
(5) Reduce: The number of times this function is called
(6) Output: The last number of documents returned
View the following collection result content:

 Db.collection.find ();
{"_id": "Canada", "value": {"Count":}}
{"_id": "" "," value ": {" Count ":}}
{"_id": "Germany", "value": {"Count":}}
{"_id": "India", "value": {"Count":}}
{"_id": "Japan", "value": {"Count": 20}}

3.2 Maprecude Other keys
(1) "Finalize": function
You can send the result of reduce to this key, which is the last step in the entire process.
(2) "Keeptemp is automatically true." ": Boolean
If true, the result is saved after the connection is closed, otherwise it is not saved.
(3) "Out": string
The name of the output collection, and Keeptemp is automatically true if set.
(4) "Query": Document
Before you send to map, filter the document with the specified criteria.
(5) "Sort": Document
Before you send it to the map, sort it first.
(6) "Limit": Integer
The maximum number of documents to be sent to the map function.
(7) "Scope": Document
Variables that can be used in javascripts code.
(8) "Verbose": Boolean
Whether to log verbose server logs.
3.2.1 Finalize function
You can use the Finalize function as an argument, execute after the last reduce output, and then save the results in a temporary collection.
3.2.2 Save Result Collection
By default, a temporary collection is created when the MapReduce is executed, and the collection name is Mr.stuff.ts.id, that is, MapReduce. Collection name. Timestamp. database Job ID. MongoDB automatically destroys the collection when the calling connection is closed.
3.2.3 Child Document execution MapReduce
Each document passed to the map needs to be deserialized first, from the Bson object to the JS object, a time-consuming process that allows you to filter the document to improve the map speed and filter the document through "query", "limit" and "sort".
The value of "query" is a query document.
"Limit", "sort" coordination can play a big role.
"Query", "limit" and "sort" can be used in combination.
3.2.4 Scopes
Scope Key "Scope", you can set this option with a variable name: a normal document such as a value.
3.2.5 to get more output
Set verbose to True, you can output more information about the MapReduce process to the server log.

4 aggregation naming
the count and distinct operations can be simplified to normal commands and do not require the use of an aggregation framework.
4.1 count
Count returns the number of documents in the collection:

Db.foo.count () =>
99

You can pass in a query document:

Db.foo.count ({country: "a"}) =>
15

Increasing the query condition will slow down count.
4.2 Distinct
distinct is used to find all the different values for a given key. You must specify collections and keys when you use them.

Db.runcommand ({"distinct": "foo", "Key": "Country"}) =>
{
    "values": [
        "Japan",
        "Germany",
        "India",
        "Canada", "
        stats"
    : {"n": "", "
        nscanned":
        " Nscannedobjects ":",
        "Timems": "
        Cursor": "Basiccursor"
    },
    "OK": 1,
    "$gleStats": {
    "Lastoptime": Timestamp (1399171995,),
        "Electionid": ObjectId ("535a2ce15918f42de9ab1427")
    }}

4.3 Group
using group allows for more complex aggregations. Select the key that the group is based on, and then divide into groups based on the different values of the selected keys, then aggregate each group to get the resulting document.
Insert Sample data:

var name = ["Caoqing", "Spider-man", "Garfield"] for
(var i = 0; i < 10000; i++) {
  iname = Name[math.floor (Math . Random () * name.length)];
  Date = new Date (). GetTime ();
  Number = Math.floor (* math.random ());
  Db.coll.insert ({_id:i, name:iname, Time:date, age:number});

The resulting list contains the latest time and the age corresponding to the latest time.
You can install name to group, and then remove the most recent document from each group and add it to the result set.

Db.runcommand ({"group": {
  "ns": "Coll",
  "key": {"name": true},
  "initial": {"Time": 0},
  "$reduce" : Function (Doc, prev) {
    if (Doc.time > Prev.time) {
      prev.age = doc.age;
      Prev.time = Doc.time
    }}
  }
)

(1) "ns": "Coll"
Specifies the collection to group.
(2) "key": {"name": true}
Specifies the key on which to group.
(3) "initial": {"Time": 0}
Initializes the time value, which is passed to the subsequent procedure as the initial Wednesday. Each group of members will use this accumulator.
Results:

' $reduce ': function (Doc, prev) {...}
{'
    retval ': [
        {
            ' name ': ' Spider-man ',
            ' time ': 1399179398567,
            ' age ': +
        },
        {
            ' name ' : "Garfield",
            "Time": 1399179398565,
            "age": "N"
        },
        {
            "name": "Caoqing",
            "Time": 1399179398566,
            "age": "
        10000}"
    ,
    "Count": "
    Keys": 3,
    "OK": 1,
    "$gleStats": {
        "Lastoptime": Timestamp (1399179362, 1),
        "Electionid": ObjectId ("535a2ce15918f42de9ab1427")
    }
}

If there is a document that does not have a key for the specified grouping, the documents are grouped separately, and the missing keys are in the form of name:null. As follows:

Db.coll.insert ({age:5, time:new Date (). GetTime ()})

return Result:

    ...
    {
      ' name ': null,
      ' time ': 1399180685288,
      ' age ': 5
    }
    ' count ': 10001,
    ' Keys ': 4,
    ...

In order to exclude documents that do not contain the keys specified for grouping, you can add "name" in "condition": {"$exists": true}.

Db.runcommand ({"group": {
  "ns": "Coll",
  "key": {"name": true},
  "initial": {"Time": 0},
  "$reduce" : Function (Doc, prev) {
    if (Doc.time > Prev.time) {
      prev.age = doc.age;
      Prev.time = Doc.time
    }
  },
  "condition": {"name": {"$exists": True}
}})

4.3.1 using the Finish device
The finalizer is used to streamline the data that is uploaded from the database to the user, because the output of the group command needs to be returned to the user through a single database response.
4.3.2 use functions as keys
grouping conditions can be very complex, not individual keys, such as grouping by Category dog and dog are two completely different groups, in order to eliminate case differences, you can define a function that determines the key on which the document is grouped.
The "$KEYF" key is required to define a grouping function.

Db.foo.group ({
  "ns": "foo",
  "$keyf": function (x) {return x.category.tolowercase ();};
  " Initial ": ..., ...
}"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.