MongoDB Aggregate,mapreduce, the difference between aggregation commands

Source: Internet
Author: User
Tags mongodb mongodb aggregate

In MongoDB, there are three ways to do this, but most of the three are silly, this article is to sum up the difference between the following three ways ~

1. Polymeric Framework Aggregate Pipeline

2. MapReduce

3. Aggregation command Group,distinct,count


Aggregation framework aggregate pipeline (aggregation pipeline)

The aggregate aggregation framework is based on a data-processing pipeline model, and documents are output aggregated through a multi-level pipeline, and the aggregate pipeline aggregation scheme uses MONGODB-built rollup operations, which are relatively efficient and It is preferred to recommend aggregate when doing MONGODB data aggregation operation;

Aggregate can be indexed to improve performance, and there is a series of pipeline performance optimization Operations This optimization summary is also described in the article;

In general, the aggregate pipe operation is a bit like a pipe operation within a UNIX system, leaving the current document in the first Pipe node processing and then dropping to the second pipe node, for example:

Restrictions on aggregate

1. When the result set (pointer or result set) returned by aggregate, an error occurs when a single document in the result set exceeds the MB command, and the limit applies only to the returned result sets document, which is likely to exceed 16MB during pipeline processing;

2. If you use aggregate to not specify a cursor option or a result in a Save collection, the aggregate command returns a Bson file in a field that is contained in the result set. If the total size of the result set exceeds the Bson file size limit (16MB), the command generates an error;

3. Pipeline processing phase has the maximum memory limit can not exceed 100MB, exceeding this limit will be reported error; In order to be able to handle larger datasets, you can open the Allowdiskuse option, you can write the pipeline operation to the temporary file;

Usage scenarios:

1. For common aggregation operations

2. When required for aggregate response Performance (Index and combinatorial optimization)

3.aggregate pipeline operation is done in memory, with memory size limit, processing data set size is limited;

Mapreduce

Map-reduce is another way to handle aggregation calculations; Map-reduce typically has two stages: one stage is to process a single document, and the other is to return one or more objects to the next document processing method;

The amount of ..... Well, in short, it's the reduce document result set, summarized by the reduce function;

Reduce don't need me to explain it.

Map-reduce uses idiomatic JavaScript operations for map and reduce operations, so map-reduce flexibility and complexity can be higher than aggregate pipeline and relatively aggregate Pipeline is more performance-intensive;

Usage Scenarios:

1. Because of the high flexibility of JavaScript, it is possible to deal with complex aggregation requirements .

2. Enable for processing large data result sets

Detailed instructions look here MongoDB MapReduce detailed Operations Summary individual aggregation commands (Group,distinct,count)

In a word, lower than aggregate performance, less flexible than map-reduce, but can save a few lines of JavaScript code, the following sentence I add, hahaha ~

Group Operations : The mongodb2.2 version contains up to 20,000 elements for the return data, supports up to 20000 independent groupings, and recommends using MapReduce for more than 20000 independent groupings;

count Operation : Db.collection.count () is equivalent to Db.collection.find (). Count (), in a distributed collection, there will be a calculation error, which is recommended for use with aggregate;

distinct operation : can use index;

Author: Xiao Zhi
Link: http://www.jianshu.com/p/e1043d9070ea
Source: Jianshu

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.