MongoDB: Aggregation Pipeline

Source: Internet
Author: User

New appearing in the MongoDB2.2.

Aggregation pipeline data aggregation framework based on conceptual modeling of data processing pipelines. The document enters a multi-stage pipeline that translates the document into aggregated results.

The aggregation pipeline provides alternatives to the Map-reduce method and is the preferred solution in many aggregation tasks, because the complexity of map-reduce may be something you do not want to see.



is an operation with a annotated aggregation pipeline, with two stages: $match and $group

The aggregation pipeline has many restrictions on the type of value and the result size. The following is a brief introduction

The aggregation operation has restrictions when using the aggregate command:

    • Type restrictions

The aggregation pipeline does not operate on the following types of values: Symbol,minkey,maxkey,dbref,code and Codewsrope

(The restrictions on binary types were lifted in the MongoDB2.4 version.) In MongoDB2.2, pipelines cannot operate on binary type data)

    • Result size limit

If the single document returned by the aggregate command protects the complete result set, the command produces an error when the result set exceeds the Bson document size limit, and the current size is 16M. To manage a result set that exceeds this limit, the aggregate command returns a result set of any size when the command returns a cursor or saves the result in a collection.

(This size is not limited when the mongodb2.6,aggregate command returns a cursor or if a collection result exists.) Db.collection.aggregate () returns a cursor that can return a result set of any size. )

    • Memory limit

There has been a change in MongoDB2.6.

The pipeline stage has a 100M limit in RAM. If this limit is exceeded, MongoDB will make an error. To allow manipulation of large data, you can use the Allowdiskuse option to write data to temporary files when the aggregation pipeline stage.


Pipeline


Piping, as the name implies, is a journey from a collection of documents through a clustered pipeline that can transform these objects when passed through . This concept is similar to piping (pipe), which is familiar with Unix shells commands (such as bash).

The MongoDB aggregation pipeline starts with a collection of documents, and the flow document processes the document from one pipeline operation (pipeline operator) to the next. Each operator in the pipeline translates the document as it passes through the pipeline. The pipe operator does not need to produce an output document for each input document. Operators can generate new documents or filter documents. Pipe operations can be repeated inside a pipe.


Pipeline expression

Each pipe operator accepts a pipe expression as the operand. The pipeline expression indicates the conversion process applied to the input document. The expression has a document structure and contains fields, values, and operators.

The pipe expression can only manipulate the current document in the pipeline and cannot reference data in other documents: the expression provides a memory (in-memory) document conversion.

In general, the expression is stateless, with an exception to the aggregation process: accumulation expressions. Accumulate expressions that use the $group pipeline to maintain their state (for example, totals,maximums,mininums and related data) as a document process through a pipeline.


Aggregation Pipeline Behavior

The Mongodb,aggregate command operates on a single collection and logically passes the entire document to the aggregation pipeline. To optimize this operation, the following strategies should be used to avoid scanning the entire collection, where possible.

    1. Pipe operators and indexes

$match and $sort pipe operators can take advantage of the index if they appear at the beginning of the pipeline.

(New in Mongo2.4: $geoNear pipeline operators can take advantage of geographic indexing.) When using $geonear, $geoNear must appear in the first stage of the aggregation pipeline. )

Even though the pipeline uses an index, the aggregation operation still accesses the actual document. For example, an index cannot completely overwrite a clustered pipeline.

(In versions prior to Mongo2.6, the index was able to cover the pipeline for very small selection cases)


Filter ahead

If your aggregation application requires only a subset of the data for a collection, use $match, $limit, $skip stage to restrict the document when the document enters the pipeline. When placed at the beginning of a pipeline, the $match operator uses the appropriate index to scan the collection for matching documents.

At the beginning of the pipeline, the placement of $match is logically equivalent to a single query that uses sorting and can be indexed using the $sort phase. If possible, place the $match at the beginning of the pipe.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.