New appearing in the MongoDB2.2.
Aggregation pipeline data aggregation framework based on conceptual modeling of data processing pipelines. The document enters a multi-stage pipeline that translates the document into aggregated results.
The aggregation pipeline provides alternatives to the Map-reduce method and is the preferred solution in many aggregation tasks, because the complexity of map-reduce may be something you do not want to see.
is an operation with a annotated aggregation pipeline, with two stages: $match and $group
The aggregation pipeline has many restrictions on the type of value and the result size. The following is a brief introduction
The aggregation operation has restrictions when using the aggregate command:
The aggregation pipeline does not operate on the following types of values: Symbol,minkey,maxkey,dbref,code and Codewsrope
(The restrictions on binary types were lifted in the MongoDB2.4 version.) In MongoDB2.2, pipelines cannot operate on binary type data)
If the single document returned by the aggregate command protects the complete result set, the command produces an error when the result set exceeds the Bson document size limit, and the current size is 16M. To manage a result set that exceeds this limit, the aggregate command returns a result set of any size when the command returns a cursor or saves the result in a collection.
(This size is not limited when the mongodb2.6,aggregate command returns a cursor or if a collection result exists.) Db.collection.aggregate () returns a cursor that can return a result set of any size. )
There has been a change in MongoDB2.6.
The pipeline stage has a 100M limit in RAM. If this limit is exceeded, MongoDB will make an error. To allow manipulation of large data, you can use the Allowdiskuse option to write data to temporary files when the aggregation pipeline stage.
Pipeline
Piping, as the name implies, is a journey from a collection of documents through a clustered pipeline that can transform these objects when passed through . This concept is similar to piping (pipe), which is familiar with Unix shells commands (such as bash).
The MongoDB aggregation pipeline starts with a collection of documents, and the flow document processes the document from one pipeline operation (pipeline operator) to the next. Each operator in the pipeline translates the document as it passes through the pipeline. The pipe operator does not need to produce an output document for each input document. Operators can generate new documents or filter documents. Pipe operations can be repeated inside a pipe.
Pipeline expression
Each pipe operator accepts a pipe expression as the operand. The pipeline expression indicates the conversion process applied to the input document. The expression has a document structure and contains fields, values, and operators.
The pipe expression can only manipulate the current document in the pipeline and cannot reference data in other documents: the expression provides a memory (in-memory) document conversion.
In general, the expression is stateless, with an exception to the aggregation process: accumulation expressions. Accumulate expressions that use the $group pipeline to maintain their state (for example, totals,maximums,mininums and related data) as a document process through a pipeline.
Aggregation Pipeline Behavior
The Mongodb,aggregate command operates on a single collection and logically passes the entire document to the aggregation pipeline. To optimize this operation, the following strategies should be used to avoid scanning the entire collection, where possible.
- Pipe operators and indexes
$match and $sort pipe operators can take advantage of the index if they appear at the beginning of the pipeline.
(New in Mongo2.4: $geoNear pipeline operators can take advantage of geographic indexing.) When using $geonear, $geoNear must appear in the first stage of the aggregation pipeline. )
Even though the pipeline uses an index, the aggregation operation still accesses the actual document. For example, an index cannot completely overwrite a clustered pipeline.
(In versions prior to Mongo2.6, the index was able to cover the pipeline for very small selection cases)
Filter ahead
If your aggregation application requires only a subset of the data for a collection, use $match, $limit, $skip stage to restrict the document when the document enters the pipeline. When placed at the beginning of a pipeline, the $match operator uses the appropriate index to scan the collection for matching documents.
At the beginning of the pipeline, the placement of $match is logically equivalent to a single query that uses sorting and can be indexed using the $sort phase. If possible, place the $match at the beginning of the pipe.