MongoDB: Aggregation Pipeline

Last Update:2014-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

New appearing in the MongoDB2.2.

Aggregation pipeline data aggregation framework based on conceptual modeling of data processing pipelines. The document enters a multi-stage pipeline that translates the document into aggregated results.

The aggregation pipeline provides alternatives to the Map-reduce method and is the preferred solution in many aggregation tasks, because the complexity of map-reduce may be something you do not want to see.

is an operation with a annotated aggregation pipeline, with two stages: $match and $group

The aggregation pipeline has many restrictions on the type of value and the result size. The following is a brief introduction

The aggregation operation has restrictions when using the aggregate command:

Type restrictions

The aggregation pipeline does not operate on the following types of values: Symbol,minkey,maxkey,dbref,code and Codewsrope

(The restrictions on binary types were lifted in the MongoDB2.4 version.) In MongoDB2.2, pipelines cannot operate on binary type data)

Result size limit

If the single document returned by the aggregate command protects the complete result set, the command produces an error when the result set exceeds the Bson document size limit, and the current size is 16M. To manage a result set that exceeds this limit, the aggregate command returns a result set of any size when the command returns a cursor or saves the result in a collection.

(This size is not limited when the mongodb2.6,aggregate command returns a cursor or if a collection result exists.) Db.collection.aggregate () returns a cursor that can return a result set of any size. ）

Memory limit

There has been a change in MongoDB2.6.

The pipeline stage has a 100M limit in RAM. If this limit is exceeded, MongoDB will make an error. To allow manipulation of large data, you can use the Allowdiskuse option to write data to temporary files when the aggregation pipeline stage.

Pipeline

Piping, as the name implies, is a journey from a collection of documents through a clustered pipeline that can transform these objects when passed through . This concept is similar to piping (pipe), which is familiar with Unix shells commands (such as bash).

The MongoDB aggregation pipeline starts with a collection of documents, and the flow document processes the document from one pipeline operation (pipeline operator) to the next. Each operator in the pipeline translates the document as it passes through the pipeline. The pipe operator does not need to produce an output document for each input document. Operators can generate new documents or filter documents. Pipe operations can be repeated inside a pipe.

Pipeline expression

Each pipe operator accepts a pipe expression as the operand. The pipeline expression indicates the conversion process applied to the input document. The expression has a document structure and contains fields, values, and operators.

The pipe expression can only manipulate the current document in the pipeline and cannot reference data in other documents: the expression provides a memory (in-memory) document conversion.

In general, the expression is stateless, with an exception to the aggregation process: accumulation expressions. Accumulate expressions that use the $group pipeline to maintain their state (for example, totals,maximums,mininums and related data) as a document process through a pipeline.

Aggregation Pipeline Behavior

The Mongodb,aggregate command operates on a single collection and logically passes the entire document to the aggregation pipeline. To optimize this operation, the following strategies should be used to avoid scanning the entire collection, where possible.

Pipe operators and indexes

$match and $sort pipe operators can take advantage of the index if they appear at the beginning of the pipeline.

(New in Mongo2.4: $geoNear pipeline operators can take advantage of geographic indexing.) When using $geonear, $geoNear must appear in the first stage of the aggregation pipeline. ）

Even though the pipeline uses an index, the aggregation operation still accesses the actual document. For example, an index cannot completely overwrite a clustered pipeline.

(In versions prior to Mongo2.6, the index was able to cover the pipeline for very small selection cases)

Filter ahead

If your aggregation application requires only a subset of the data for a collection, use $match, $limit, $skip stage to restrict the document when the document enters the pipeline. When placed at the beginning of a pipeline, the $match operator uses the appropriate index to scan the collection for matching documents.

At the beginning of the pipeline, the placement of $match is logically equivalent to a single query that uses sorting and can be indexed using the $sort phase. If possible, place the $match at the beginning of the pipe.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MongoDB: Aggregation Pipeline

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MongoDB: Aggregation Pipeline

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support