MongoDB: Gathering Pipeline,

Source: Internet
Author: User

MongoDB: Gathering Pipeline,

It appears in MongoDB2.2.

A data aggregation framework based on data processing pipeline conceptual modeling. The document enters a multi-stage pipeline that can convert the document into clustering results.

The clustering pipeline provides a substitute for the map-reduce method and is the preferred solution for many clustering tasks, because the complexity of map-reduce may not be expected.



Is an annotation-based aggregate pipeline operation, which has two phases: $ match and $ group.

Clustering pipelines have many restrictions on the type and result size of values. The following is a brief introduction,

Restrictions on aggregate operations when using the aggregate command:

  • Type restrictions

The aggregate MPs queue does not operate on values of the following types: Symbol, Minkey, MaxKey, DBRef, Code, and CodeWSrope.

(Binary type restrictions are removed in MongoDB2.4. In MongoDB2.2, pipelines cannot operate on Binary data)

  • Result size limit

If a single Document returned by the aggregate command protects the complete result set, this command will generate an error when the result set exceeds the BSON Document Size limit. The current Size is 16 Mb. To manage result sets that exceed this limit, the aggregate command can return any size of result sets when the command returns a cursor or saves the results in a collection.

(In MongoDB2.6, when the aggregate command returns a cursor or stores the result in a collection, it is not limited by this size. Db. collection. aggregate () returns a cursor that can return result sets of any size .)

  • Memory limit

It has changed in MongoDB2.6.

The pipeline stage has a limit of MB in RAM. If this limit is exceeded, an error occurs in MongoDB. To allow operations on large data volumes, you can use the allowDiskUse option to write data to temporary files during the aggregation pipeline stage.


MPs queue


A pipeline, as its name implies, is a collection of documents traveling through a collection pipeline, where the pipeline can convert these objects. Familiar with Unix shells commands (such as bash), this concept is similar to pipeline.

The MongoDB aggregation pipeline starts with a collection of documents.MPs queue operations(Pipeline operator) to the next to process the document. Every operator in the pipeline converts the document when the document passes through the pipeline. Pipeline operators do not need to generate an output document for each input document. Operators can generate new documents and filter documents. Pipeline operations can be repeated in one pipeline.


MPs queue expression

Each pipeline operator accepts a pipeline expression as its operand. The pipeline expression indicates the conversion process of the application in the input document. The expression hasDocument(Document) structure, including fields, values, and operators.

The MPs queue expression can only operate on the current file in the MPs queue and cannot reference data in other documents.Memory(In-memory) document conversion.

In general, the expression is stateless and there is only one exception during the aggregation process: Accumulation expressions. Accumulate expressions, use the $ group pipeline to maintain their statuses (such as totals, maximums, mininums, and related data) as the document process through the pipeline.


Gathering Pipeline Behavior

In MongoDB, the aggregate command operates a single set and logically passes the entire document to the aggregation pipeline. To optimize this operation, use the following policy to avoid scanning the entire set whenever possible.

$ Match and $ sort pipeline operators can take advantage of indexing if they appear at the beginning of the pipeline.

(Emerging in question 2.4: $ geoNear pipeline operators can take advantage of geographical indexes. When $ geoNear is used, $ geoNear must appear in the first stage of the aggregation pipeline .)

Even if the pipeline uses an index, the aggregation operation still needs to access the actual document. For example, the index cannot completely overwrite the aggregation pipeline.

(In versions earlier than ipv2.6, indexes can overwrite pipelines when selecting a very small scale)


Filter in advance

If your clustered application only needs one data subset of a set, use $ match, $ limit, and $ skip to restrict the document when the document enters the pipeline. when placed at the beginning of the pipeline, the $ match operator uses an appropriate index to scan documents matching the set.

Placing $ match at the beginning of the pipeline following the $ sort stage is logically equivalent to a single query that uses sorting and can use indexes. if possible, place $ match at the beginning of the pipeline.





Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.