MongoDB document Aggregate Chapter Read the notes

Source: Internet
Author: User

Aggregate contains the different types in 3:

1. Pipeline

2. Single function aggregation (COUNT,GROUP,DISTINCT)

3. Map-reduce


Pipeline expression


A pipe expression can only manipulate documents that are currently in the pipeline and cannot represent other documents.

Typically, expressions are stateless and evaluated during aggregation, with one exception: the accumulator expression


accumulators, which can only be used in the group pipe operators, have the following main states: Total, maximum, minimum, and associated data.



The way to optimize:

$match He $sort pipe operators can take advantage of the index when they appear in the first position of the pipe. Avoid scanning all in the collection

Document

The $geonear pipe operator introduced from 2.4 can take advantage of geospatial index. When used, must be placed at the beginning of the pipe.


Although the pipeline uses an index. Aggregations still require access to the actual document. The index cannot completely overwrite the aggregation pipeline.


Initial filtration (Early Filtering)

If the aggregation pipeline operation requires only part of the data for the entire collection, use $match, $limit, and $skip steps to restrict access to the pipeline

Number of documents. At the inlet end of the pipeline, the $match operation is scanned with the appropriate index, only the document that brings the compound condition into the pipeline.


Use $match at the beginning of the pipeline, followed by the use of $sort logically equivalent to a simple query using sort, with an index. If possible,

Try to place the $match operator at the beginning of the pipe.


Additional Features:

The aggregation index has an internal optimization phase to improve aggregation performance.

Aggregation pipelines are supported on the collection of shards.



The mapreduce operation of MongoDB

The map method processes each input document, and the map method finally generates Key-value pairs.



Output limit: Must be within the size range of the Bson document and is currently 16M.


The collection of MapReduce input and output supports collections in shards.




Built-in optimization mechanism:

1. Project optimization: The aggregation pipeline can determine how many fields are required in a pipeline, so

Pipelines only use fields that need to be used


Pipeline Order Optimization:

If it is written:

$sort = = $match will be optimized to:

$match = $sort



{$sort: {Age:-1}},

{$match: {status: ' A '}}


{$match: {status: ' A '}},

{$sort: {Age:-1}}



{$skip: 10},

{$limit: 5}


{$limit: 15},

{$skip: 10}

Reduce the number of $skip



{$redact: {$cond: {if: {$eq: ["$level", 5]}, then: "$ $PRUNE", Else: "$ $DESCEND"}}},

{$match: {year:2014, Category: {$ne: ' Z '}}}


{$match: {year:2014}},

{$redact: {$cond: {if: {$eq: ["$level", 5]}, then: "$ $PRUNE", Else: "$ $DESCEND"}}},

{$match: {year:2014, Category: {$ne: ' Z '}}}


Merge optimization

$sort + $limit Consolidation optimization

When the sort is followed by the limit. The optimizer merges the limit into sort so that only the first n values need to be accessed.

Saves memory.

The optimization would still apply when Allowdiskuse is true and the n items exceed the Aggregation memory limit (page 403) .



$limit + $limit Consolidation optimization

When 2 consecutive $limit are together, they merge into 1,

and choose the number $limit smaller.


Two $skip in a row

Two consecutive $skip will be merged into one, followed by a number of two $skip.


Two consecutive $match. are merged into 1, the conditions are merged together.



{$sort: {Age:-1}},

{$skip: 10},

{$limit: 5}



{$sort: {Age:-1}},

{$limit: 15}

{$skip: 10}



Example

{$sort: {Age:-1}},

{$skip: 10},

{$limit: 5}

====>>>>

{$sort: {Age:-1}},

{$limit: 15}

{$skip: 10}

====>>>>

$sort + $limit Merge




{$limit: 100},

{$skip: 5},

{$limit: 10},

{$skip: 2}


==========>>>>>>>

{$limit: 100},

{$limit: 15},

{$skip: 5},

{$skip: 2}

=========>>>>>>>>>

{$limit: 15},

{$skip: 7}



Result Size Restrictions

Manage result sets that exceed this limit, the aggregate command can return result sets of any size if the command

Return a cursor or store the results to a collection.

Changed in version 2.6:the Aggregate command can return results as a cursor or store the results in a collection,

Which is not subject to the size limit. The Db.collection.aggregate () returns a cursor and can return result

Sets of any size.



Memory restrictions

Changed in version 2.6.

Pipeline stages has a limit of megabytes of RAM. If a stage exceeds this limit, MongoDB would produce an error.

To-handling of large datasets, use the Allowdiskuse option to enable aggregation pipeline stages to

Write data to temporary files.



When an aggregation operation is performed in a collection of shards, the aggregation pipeline is split into two parts. The first pipe executes on each shard. Or when the initial $match can

Exclude some Shard sets by the assertion of the Shard key. The pipeline runs only on the associated shards.

The second pipe runs on the primary shard. Merges the results of the first stage execution. And on the basis of the results of the merger in the execution. The primary shard forwards the final result to the

MONGOs. Before 2.6. The second pipe runs on the MONGOs. ]



Map reduce performs input/output on the collection of the Shard.

If the collection of inputs is a collection of shards, MONGOs automatically dispatches map-reduce to each shard in parallel. No additional processing is required.

If the output collection is a collection of shards,

If the Out field for MapReduce has the sharded value, MongoDB shards the output collection using the _id field

As the Shard key.

? If the output collection does not exist, MongoDB creates and shards the collection on the _id field.

? For a new or an empty sharded collection, MongoDB uses the results of the first stage of the Map-reduce

Operation to create the initial chunks distributed among the shards.

? MONGOs dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During

The post-processing, each shard would pull the results for its own chunks from the other shards, run the final

Reduce/finalize, and write locally to the output collection.



Map Reduce Concurrency





Return states with populations above Million

SELECT State, SUM (pop) as Totalpop

From ZipCodes

GROUP by State

Having Totalpop >= (10*1000*1000)



Db.zipcodes.aggregate ({$group:

{_id: "$state",

Totalpop: {$sum: "$pop"}}},

{$match: {totalpop: {$gte: 10*1000*1000}}})



Return Average City Population by state

Averages are based on stat and city.

Db.zipcodes.aggregate ([

{$group: {_id: {state: ' $state ', City: ' $city '}, Pop: {$sum: ' $pop '}}},

{$group: {_id: "$_id.state", Avgcitypop: {$avg: "$pop"}}}

] )


The corresponding SQL statement

Select State, AVG (Sum_pop) from (select State, City, sum (POPs) as Sum_pop from ZipCodes group by city, state) as Temp

GROUP BY Temp.state


Return largest and smallest Cities by state


Returns the maximum. Minimum value


Db.zipcodes.aggregate ({$group:

{_id: {state: ' $state ', City: ' $city '},

Pop: {$sum: "$pop"}}},

{$sort: {pop:1}},

{$group:

{_id: "$_id.state",

Biggestcity: {$last: "$_id.city"},

Biggestpop:

{$last: "$pop"},

Smallestcity: {$first: "$_id.city"},

Smallestpop: {$first: "$pop"}}},

The following $project is optional, and

Modifies the output format.

{$project:

{_id:0,

State: "$_id",

Biggestcity: {name: "$biggestCity", Pop: "$biggestPop"},

Smallestcity: {name: "$smallestCity", Pop: "$smallestPop"}})




Return the Five most Common "likes"



Db.users.aggregate (

[

{$unwind: "$likes"},//Here you have to comment:? The $unwind operator separates each value in the likes array, and creates a new version of the source document for every E   Lement in the array. Split the score group with ...

{$group: {_id: "$likes", Number: {$sum: 1}},

{$sort: {number:-1}},

{$limit: 5}

]

)





Examples of map-reduce:


Data:


{

_id:objectid ("50a8240b927d5d8b5891743c"),

cust_id: "Abc123",

Ord_date:new Date ("Oct 04, 2012"),

Status: ' A ',

PRICE:25,

Items: [{sku: "MMM", Qty:5, price:2.5},

{SKU: "nnn", Qty:5, price:2.5}]

}



Return the total price Per Customer


Map method:

var mapFunction1 = function () {

Emit (this.cust_id, this.price);

};


Reduce method:

var reduceFunction1 = function (Keycustid, valuesprices) {

Return Array.sum (valuesprices);

};


Combined use:

Db.orders.mapReduce (

MapFunction1,

ReduceFunction1,

{out: "Map_reduce_example"}//Specifies output to "Map_reduce_example" in this collection

)


========

To be Continued ...

Document:

Http://pan.baidu.com/s/1jiFOM



MongoDB document Aggregate Chapter Read the notes

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.