MongoDB (4): Aggregation framework

Source: Internet
Author: User
Tags emit

First, Introduction

MongoDB's aggregation framework, which is used primarily to transform and combine the documents in a collection to make use of the data for analysis.


The basic idea of the aggregation framework is:

Multiple artifacts are used to create a pipeline for processing a series of documents.

These artifacts include:

Filter (filtering), drop shadow (projecting), group (grouping), sort (sorting), limit (limiting), and skip (skipping).


How to use the aggregation framework:

Db. Set. Aggregate (component 1, component 2 ... )

Note: Because the results of the aggregation are to be returned to the client, the aggregation result must be limited to 16M, which is the size of the maximum response message that MongoDB supports.


Ii. Examples of Use

2.1. Prepare sample Data

for (Var i=0;i<100;i++) {for (Var j=0;j<4;j++) {Db.scores.insert ({"StudentID": "S" +i, "course": "Course" +j, "score":    Math.random () *100}); }}

2.2. Find out the 3 students with the highest number of courses with more than 80 points

Steps:

1: Find all students who have tested more than 80 points, do not differentiate the course

> db.scores.aggregate ({"$match": {"score": {$gte: 80}});

2: Projecting each student's name

> db.scores.aggregate ({"$match": {"score": {$gte: 80}}},{$project: {"StudentID": 1}});

3: Sort the names of students, one student's name appears once, add 1 to him

> db.scores.aggregate ({"$match": {"score": {$gte: 80}}},{$project: {"StudentID": 1}},{$group: {"_id": "$studentId", "Count": {$sum: 1}});

4: Sort the result set in descending order of Count

> db.scores.aggregate ({"$match": {"score": {$gte: 80}}},{$project: {"StudentID": 1}},{$group: {"_id": "$studentId", "Count": {$sum: 1}}},{"$sort": {"Count":-1}});

5: Return the previous 3 data

Db.scores.aggregate ({"$match": {"score": {$gte: 80}}},{$project: {"StudentID": 1}},{$group: {"_id": "$studentId", " Count ": {$sum: 1}}},{" $sort ": {" Count ": -1}},{" $limit ": 3});

650) this.width=650; "src=" Https://s5.51cto.com/wyfs02/M00/97/A1/wKioL1kwuxegZ-SvAAAeJMx4Vz8704.jpg "title=" 1496365899 (1). jpg "alt=" wkiol1kwuxegz-svaaaejmx4vz8704.jpg "/> three, pipe operators

Each operator accepts a series of documents, handles them accordingly, and passes the converted document as a result to the next operator. The last operator returns the result.

Different pipe operators can be combined in any order, with any number of combinations.

3.1. Filter command $match

Used to filter the collection of documents, where all regular query operators can be used. It is usually placed at the front of the pipeline for the following reasons:

1: Quickly filter unwanted documents to reduce the amount of data to be subsequently manipulated

2: Filter before projection and grouping, query can use index

3.2. projection command $project

Used to extract fields from a document, you can specify the include and exclude fields, or you can rename the fields. For example, to change the StudentID to Sid, as follows:

Db.scores.aggregate ({"$project": {"Sid": "$studentId"}})

Pipeline operators can also use expressions to meet more complex requirements.

Mathematical expression for pipe operator $project:

For example, add 20 points to the result, as follows:

> db.scores.aggregate ({"$project": {"StudentID": 1, "NewsCore": {$add: ["$score", 20]}});

Supported operators and the appropriate syntax:

1: $add: [EXPR1[,EXPR2,... EXPRN]]

2: $subtract: [EXPR1,EXPR2]

3: $multiply: [EXPR1[,EXPR2,... EXPRN]]

4: $divice: [EXPR1,EXPR2]

5: $mod: [EXPR1,EXPR2]

Date expression for pipe operator $project:

The aggregation framework contains some expressions for extracting date information, as follows:

$year, $month, $week, $dayOfMonth, $dayOfWeek, $dayOfYear, $hour, $minute, $second.

Note: These can only manipulate date-based fields and cannot manipulate data, using the example:

{"$project": {"Opeday": {"$dayOfMonth": "$recoredTime"}}}

string expression for pipe operator $project:

1: $substr: [Expr, start position, number of bytes to fetch]

2: $concat: [EXPR1[,EXPR2,... EXPRN]]

3: $toLower: Expr

4: $toUpper: Expr

For example: {"$project": {"Sid": {$concat: ["$studentId", "CC"]}}}

logical expression for pipe operator $project:

1: $cmp: [EXPR1,EXPR2]: compare two expressions, 0 equals, positive front big, negative

2: $strcasecmp: [string1,string2]: Compare two strings, case-sensitive, only valid for strings made of Roman characters

3: $eq, $ne, $gt, $gte, $lt, $lte: [EXPR1,EXPR2]

4: $and, $or, $not

5: $cond: [booleanexpr,trueexpr,falseexpr]: Returns a true expression if the Boolean expression is true, otherwise returns false expression

6: $ifNull: [expr,otherexpr]: If expr is null, return otherexpr, otherwise return expr

For example: Db.scores.aggregate ({"$project": {"NewsCore": {$cmp: ["$studentId", "SSS"]}})

3.3. Group command $group

Used to group documents according to different values for a specific field. After you have selected a group field, you can pass these fields to the "_id" field of the $group function. For example:

Db.scores.aggregate ({"$group": {"_id": "$studentId"}});

or a

Db.scores.aggregate ({"$group": {"_id": {"Sid": "$studentId", "Score": "$score"}}});

$group Supported operators:

1: $sum: Value: Adds value to the calculated result for each document

2: $avg: Value: Returns the average of each group

3: $max: Expr: Returns the maximum value within a group

4: $min: Expr: Returns the minimum value within the group

5: $first: Expr: Returns the first value of the grouping, ignoring other values, which are generally only meaningful when sorted and explicitly aware of the order of the data.

6: $last: Expr: In contrast to the previous one, returns the last value of the grouping

7: $addToSet: Expr: If the current array does not contain expr, add it to the array

8: $push: expr: Adding expr to the array

3.4. Split command $unwind

Used to split each value in the array into a separate document.

650) this.width=650; "src=" Https://s4.51cto.com/wyfs02/M02/97/A3/wKioL1kwx9LBIKpVAAAlUzIGSsA235.png "title=" bw47~ O8ny]]k_29doe5_t$9.png "alt=" Wkiol1kwx9lbikpvaaaluzigssa235.png "/>

3.5. Sort command $sort

You can sort by any field, the same as the syntax in a normal query. If you want to sort a large number of documents, it is highly recommended that you sort the first stage of the pipeline, and you can use the index.

3.6. Common aggregation functions

1:count: The number of documents used to return a collection

2:distinct: Finds all the different values for a given key, and must specify the collection and key when used, for example:

Db.runcommand ({"distinct": "Users", "key": "UserId"});

Iv. MapReduce

In MongoDB's aggregation framework, it is also possible to use MapReduce, which is very powerful and flexible, but with some complexity, specifically for implementing some complex aggregation functions.

The MapReduce in MongoDB uses JavaScript as the query language, so it can express arbitrary logic, but it runs very slowly and should not be used in real-time data analysis.

4.1, the HelloWorld of the MapReduce

Implements the function of finding all the keys in the collection and counting the occurrences of each key.

The 1:map function uses the EMIT function to return the value to be processed, as shown in the following example:

var map = function () {for (var key in this) {Emit (key,{count:1}); }}

This represents a reference to the current document.

The 2:reduce function needs to process the map phase or the data from the previous reduce, so the document returned by reduce must be able to act as an element of the second parameter of reduce, as shown in the following example:

var reduce = function (key,emits) {var total = 0;    for (var i in emits) {total + = Emits[i].count; } return {"Count": total};};

3: Run MapReduce with the following example:

> var Mr =db.runcommand ({"MapReduce": "Scores", "map": Map, "reduce": Reduce, "out": "Mrout"});

Description: Scores is the collection name, map is the map function, and reduce is the reduce function, Mrout is the variable name of the output

4: Query The final results, examples are as follows:

Db.mrout.find ();

650) this.width=650; "src=" Https://s2.51cto.com/wyfs02/M00/97/A4/wKioL1kwzW2SYV4VAAAan6Ls3PY486.png "title=" kd{}q $K ' @KM (@ (3M) uqayah.png "alt=" Wkiol1kwzw2syv4vaaaan6ls3py486.png "/>

You can also change it, such as statistics StudentID, and the number of occurrences of each value, you can do the following:

1: Modify the Map function in the following example:

var map = function () {Emit (This.studentid,{count:1});};

2:reduce function does not need to change

3: Re-execution

Db.runcommand ({"MapReduce": "Scores", "map": Map, "reduce": Reduce, "out": "Mrout"});

4: View Final results

Db.mrout.find ();

650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M02/97/A4/wKioL1kwzrqgOY29AAA2gVwzOA0595.png "title=" 7% H846n{4cqx3ilb19y (ik7.png "alt=" Wkiol1kwzrqgoy29aaa2gvwzoa0595.png "/>

4.2. More MapReduce Optional keys

1:finalize:function: You can send the results of reduce to finalize, which is the last step in the entire process

2:keeptemp:boolean: Whether to save the temporary result collection when the connection is closed

3:query:document: Filter the document before sending it to map

4:sort:document: Sort documents before sending to map

5:limit:integer: Maximum number of documents destined for the map function

6:scope:document: Variables that can be used in JavaScript

7:verbose:boolean: Whether to log verbose server logs


Example:

var query = {"StudentID": {"$lt": "s2"}}var sort = {"StudentID": 1};var finalize = function (Key,value) {return {"MyKey": Ke Y, "MyV": Value};}; var Mr =db.runcommand ({"MapReduce": "Scores", "map": Map, "reduce": Reduce, "out": "Mrout", "Query": Query, "sort": Sort, " Limit ": 2," Finalize ": Finalize});

650) this.width=650; "src=" Https://s1.51cto.com/wyfs02/M00/97/A5/wKioL1kw0R2wOrL8AAAKXqTSdok429.png "title=" _~{ Mwg{t[8}ey7 (}plgk8@j.png "alt=" Wkiol1kw0r2worl8aaakxqtsdok429.png "/>

V. Aggregation Command Group

Used to group collections, group them, and then aggregate the documents within each group.

For example, to group StudentID, to find the highest score for each student, the following steps can be performed:

Db.runcommand ({"group": {"ns": "Scores", "key": {"StudentID": 1}, "initial": {"score": 0}, "$reduce": function (doc,p        Rev) {if (Doc.score > Prev.score) {prev.score = Doc.score; }    }}});

NS: Specifies the collection to be grouped

Key: Specify the grouped keys

Initial: When the reduce function of each group is called, it is called once at the beginning to initialize

$reduce: Executed on each document in each group, the system will automatically pass in two parameters, Doc is the document currently being processed, Prev is the result document of the previous execution of this group


You can also add a condition to the group by adding condition, such as:

Db.runcommand ({"group": {"ns": "Scores", "key": {"StudentID": 1}, "initial": {"score": 0}, "$reduce": function (doc,p        Rev) {if (Doc.score > Prev.score) {prev.score = Doc.score; }}, "condition": {"StudentID": {$lt: "S2"}}});


You can also use finalizer to make final processing of the results of reduce, such as requiring each student's average score, you can first group by StudentID, to find a total score, and then in the finalizer, the average score:

Db.runcommand ({"group": {"ns": "Scores", "key": {"StudentID": 1}, "initial": {"Total": 0}, "$reduce": function (doc,p    Rev) {prev.total + = Doc.score;    }, "condition": {"StudentID": {"$lt": "S2"}}, "Finalize": function (prev) {prev.avg = PREV.TOTAL/3; }}});

Note: Finalize is called only once for each set of results to be returned to the user, that is, each set of results is called only once


For groups of keys more complex, you can also use a function as a key, such as the key is not differentiated by size, it can be defined as follows:

Db.runcommand ({"group": {"ns": "Scores", $keyf: function (DOC) {return {studentId:doc.studentId.toLowerCase ()};    }, "initial": {"Total": 0}, "$reduce": function (Doc,prev) {prev.total + = Doc.score;  }, "condition": {"$or": [{"StudentID": {"$lt": "S2"}},{"StudentID": "S0"}]}, "Finalize": function (prev) {prev.avg =    PREV.TOTAL/3; }}});

Note: To use $KEYF to define functions as keys, it is also important to return the format of the object


This article is from "I Love Big gold" blog, please be sure to keep this source http://1754966750.blog.51cto.com/7455444/1931674

MongoDB (4): Aggregation framework

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.