Aggregation pipeline for data aggregation in MongoDB aggregate

Source: Internet
Author: User
Tags mongodb aggregate

in the two previous articles < the basic aggregation function of data aggregation in MongoDB count, distinct, group > and < the MapReduce of data aggregation in MongoDB >, we've provided two implementations for data aggregation, and today, in this article, we talk about another way to implement data aggregation in MongoDB-the aggregation pipeline aggregate.

In the face of the user's demand for data statistics, MONGODB has introduced a new functional aggregation framework (aggregation framework) since the 2.2 release, which is a new framework for data aggregation, which is similar to the pipeline in processing. Each document passes through a pipeline of multiple nodes, each of which has its own special role (grouping, filtering, and so on), and the document passes through a pipeline consisting of multiple nodes, resulting in the output. There are two basic functions of the pipeline: (1) Filtering the document, filtering out the eligible documents, and (2) transforming the document to change the output structure of the document.

How to use the aggregation pipe: db.collection.aggregate ();

For multiple nodes in a pipeline, you can use the following pipe operators, which describe the functions of the various pipe operators:

$project: Modify the structure of the document (rename, add, or delete fields), or you can use it to create calculations and nested documents.
$match: Filter the data to only output documents that match the criteria.
$limit: Limits the number of documents returned by the MongoDB aggregation pipeline.
$skip: Skips a specified number of documents in the aggregation pipeline.
$unwind: Splits one of the array type fields in the document into multiple bars, each containing a value in the array.
$group: Groups The documents in the collection to be used for statistical results.
$sort: Document sort output.
$geoNear: Outputs an ordered document that is close to a geographic location.

Give two simple examples:

    { $project : {
        title : 1 ,
        author : 1 ,

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: user, count: { $sum: 1 } } }
                       ] );

The following is a description of what needs to be noted during the use of the aggregation pipeline:

(1) The pipeline is in sequence.
(2) $group operation is currently processed in memory, therefore, a large number of documents can not be used in such a way to group operations;
(3) Use $unwind to split the value of a field in an array you need to be careful not to forget to write the $ symbol, such as {$unwind: "$tags"},tags field preceded by a $ number;
(4) MongoDB 24. Memory is optimized, if $sort appears before $limit, $sort will only operate on the first $limit documents, in memory will only retain the first $limit documents, saving memory
(5) $sort operation is in memory, if it occupies more than 10% of the physical memory, the program will produce an error

(6) The output size of the pipe cannot be greater than 16M, and error will occur.
(7) If a pipe operator occupies more than 10% of the memory capacity of the system during execution, it will error.
(8) The aggregation pipeline can provide good performance and a consistent interface, it is relatively simple to use, for some simple fixed aggregation operation can use the pipeline, but for some complex, large number of data sets of aggregation task or use MapReduce.

At this point, a simple description of the data aggregation operations in the MongoDB database is over, and if you want to learn more deeply, I think the official website is the best textbook.

Aggregation pipeline for data aggregation in MongoDB aggregate

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.