MongoDB: aggregation pipeline-mysql tutorial

Last Update:2018-03-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

It appears in MongoDB2.2. A data aggregation framework based on data processing pipeline conceptual modeling. The document enters a multi-stage pipeline that can convert the document into clustering results. The clustering pipeline provides a substitute for the map-reduce method and is the preferred solution for many clustering tasks, because the complexity of map-reduce may not be expected.

It appears in MongoDB2.2.

A data aggregation framework based on data processing pipeline conceptual modeling. The document enters a multi-stage pipeline that can convert the document into clustering results.

The clustering pipeline provides a substitute for the map-reduce method and is the preferred solution for many clustering tasks, because the complexity of map-reduce may not be expected.

Is an annotation-based aggregate pipeline operation, which has two phases: $ match and $ group.

Clustering pipelines have many restrictions on the type and result size of values. The following is a brief introduction,

Restrictions on aggregate operations when using the aggregate command:

Type restrictions

The aggregate MPs queue does not operate on values of the following types: Symbol, Minkey, MaxKey, DBRef, Code, and CodeWSrope.

(Binary type restrictions are removed in MongoDB2.4. In MongoDB2.2, pipelines cannot operate on Binary data)

Result size limit

If a single Document returned by the aggregate Command protects the complete result set, this command will generate an error when the result set exceeds the BSON Document Size limit. the current Size is 16 MB. To manage result sets that exceed this limit, the aggregate command can return any size of result sets when the command returns a cursor or saves the results in a collection.

(In MongoDB2.6, when the aggregate command returns a cursor or stores the result in a collection, it is not limited by this size. Db. collection. aggregate () returns a cursor that can return result sets of any size .)

Memory limit

It has changed in MongoDB2.6.

The pipeline stage has a limit of MB in RAM. If this limit is exceeded, an error occurs in MongoDB. To allow operations on large data volumes, you can use the allowDiskUse option to write data to temporary files during the aggregation pipeline stage.

MPs queue

A pipeline, as its name implies, is a collection of documents traveling through a collection pipeline, where the pipeline can convert these objects. Familiar with Unix shells commands (such as bash), this concept is similar to pipeline.

The MongoDB aggregation pipeline starts with a collection of documents.MPs queue operations(Pipeline operator) to the next to process the document. Every operator in the pipeline converts the document when the document passes through the pipeline. Pipeline operators do not need to generate an output document for each input document. Operators can generate new documents and filter documents. Pipeline operations can be repeated in one pipeline.

MPs queue expression

Each pipeline operator accepts a pipeline expression as its operand. The pipeline expression indicates the conversion process of the application in the input document. The expression hasDocument(Document) structure, including fields, values, and operators.

The MPs queue expression can only operate on the current file in the MPs queue and cannot reference data in other documents.Memory(In-memory) document conversion.

In general, the expression is stateless and there is only one exception during the aggregation process: Accumulation expressions. Accumulate expressions, use the $ group pipeline to maintain their statuses (such as totals, maximums, mininums, and related data) as the document process through the pipeline.

Gathering pipeline behavior

In MongoDB, the aggregate command operates a single set and logically passes the entire document to the aggregation pipeline. To optimize this operation, use the following policy to avoid scanning the entire set whenever possible.

Pipeline operators and indexes

$ Match and $ sort pipeline operators can take advantage of indexing if they appear at the beginning of the pipeline.

(Emerging in question 2.4: $ geoNear pipeline operators can take advantage of geographical indexes. When $ geoNear is used, $ geoNear must appear in the first stage of the aggregation pipeline .)

Even if the pipeline uses an index, the aggregation operation still needs to access the actual document. For example, the index cannot completely overwrite the aggregation pipeline.

(In versions earlier than ipv2.6, indexes can overwrite pipelines when selecting a very small scale)

Filter in advance

If your clustered application only needs one data subset of a set, use $ match, $ limit, and $ skip to restrict the document when the document enters the pipeline. when placed at the beginning of the pipeline, the $ match operator uses an appropriate index to scan documents matching the set.

Placing $ match at the beginning of the pipeline following the $ sort stage is logically equivalent to a single query that uses sorting and can use indexes. if possible, place $ match at the beginning of the pipeline.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MongoDB: aggregation pipeline-mysql tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MongoDB: aggregation pipeline-mysql tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support