Aggregate contains the different types in 3:
1. Pipeline
2. Single function aggregation (COUNT,GROUP,DISTINCT)
3. Map-reduce
Pipeline expression
A pipe expression can only manipulate documents that are currently in the pipeline and cannot represent other documents.
Typically, expressions are stateless and evaluated during aggregation, with one exception: the accumulator expression
accumulators, which can only be used in the group pipe operators, have the following main states: Total, maximum, minimum, and associated data.
The way to optimize:
$match He $sort pipe operators can take advantage of the index when they appear in the first position of the pipe. Avoid scanning all in the collection
Document
The $geonear pipe operator introduced from 2.4 can take advantage of geospatial index. When used, must be placed at the beginning of the pipe.
Although the pipeline uses an index. Aggregations still require access to the actual document. The index cannot completely overwrite the aggregation pipeline.
Initial filtration (Early Filtering)
If the aggregation pipeline operation requires only part of the data for the entire collection, use $match, $limit, and $skip steps to restrict access to the pipeline
Number of documents. At the inlet end of the pipeline, the $match operation is scanned with the appropriate index, only the document that brings the compound condition into the pipeline.
Use $match at the beginning of the pipeline, followed by the use of $sort logically equivalent to a simple query using sort, with an index. If possible,
Try to place the $match operator at the beginning of the pipe.
Additional Features:
The aggregation index has an internal optimization phase to improve aggregation performance.
Aggregation pipelines are supported on the collection of shards.
The mapreduce operation of MongoDB
The map method processes each input document, and the map method finally generates Key-value pairs.
Output limit: Must be within the size range of the Bson document and is currently 16M.
The collection of MapReduce input and output supports collections in shards.
Built-in optimization mechanism:
1. Project optimization: The aggregation pipeline can determine how many fields are required in a pipeline, so
Pipelines only use fields that need to be used
Pipeline Order Optimization:
If it is written:
$sort = = $match will be optimized to:
$match = $sort
{$sort: {Age:-1}},
{$match: {status: ' A '}}
{$match: {status: ' A '}},
{$sort: {Age:-1}}
{$skip: 10},
{$limit: 5}
{$limit: 15},
{$skip: 10}
Reduce the number of $skip
{$redact: {$cond: {if: {$eq: ["$level", 5]}, then: "$ $PRUNE", Else: "$ $DESCEND"}}},
{$match: {year:2014, Category: {$ne: ' Z '}}}
{$match: {year:2014}},
{$redact: {$cond: {if: {$eq: ["$level", 5]}, then: "$ $PRUNE", Else: "$ $DESCEND"}}},
{$match: {year:2014, Category: {$ne: ' Z '}}}
Merge optimization
$sort + $limit Consolidation optimization
When the sort is followed by the limit. The optimizer merges the limit into sort so that only the first n values need to be accessed.
Saves memory.
The optimization would still apply when Allowdiskuse is true and the n items exceed the Aggregation memory limit (page 403) .
$limit + $limit Consolidation optimization
When 2 consecutive $limit are together, they merge into 1,
and choose the number $limit smaller.
Two $skip in a row
Two consecutive $skip will be merged into one, followed by a number of two $skip.
Two consecutive $match. are merged into 1, the conditions are merged together.
{$sort: {Age:-1}},
{$skip: 10},
{$limit: 5}
{$sort: {Age:-1}},
{$limit: 15}
{$skip: 10}
Example
{$sort: {Age:-1}},
{$skip: 10},
{$limit: 5}
====>>>>
{$sort: {Age:-1}},
{$limit: 15}
{$skip: 10}
====>>>>
$sort + $limit Merge
{$limit: 100},
{$skip: 5},
{$limit: 10},
{$skip: 2}
==========>>>>>>>
{$limit: 100},
{$limit: 15},
{$skip: 5},
{$skip: 2}
=========>>>>>>>>>
{$limit: 15},
{$skip: 7}
Result Size Restrictions
Manage result sets that exceed this limit, the aggregate command can return result sets of any size if the command
Return a cursor or store the results to a collection.
Changed in version 2.6:the Aggregate command can return results as a cursor or store the results in a collection,
Which is not subject to the size limit. The Db.collection.aggregate () returns a cursor and can return result
Sets of any size.
Memory restrictions
Changed in version 2.6.
Pipeline stages has a limit of megabytes of RAM. If a stage exceeds this limit, MongoDB would produce an error.
To-handling of large datasets, use the Allowdiskuse option to enable aggregation pipeline stages to
Write data to temporary files.
When an aggregation operation is performed in a collection of shards, the aggregation pipeline is split into two parts. The first pipe executes on each shard. Or when the initial $match can
Exclude some Shard sets by the assertion of the Shard key. The pipeline runs only on the associated shards.
The second pipe runs on the primary shard. Merges the results of the first stage execution. And on the basis of the results of the merger in the execution. The primary shard forwards the final result to the
MONGOs. Before 2.6. The second pipe runs on the MONGOs. ]
Map reduce performs input/output on the collection of the Shard.
If the collection of inputs is a collection of shards, MONGOs automatically dispatches map-reduce to each shard in parallel. No additional processing is required.
If the output collection is a collection of shards,
If the Out field for MapReduce has the sharded value, MongoDB shards the output collection using the _id field
As the Shard key.
? If the output collection does not exist, MongoDB creates and shards the collection on the _id field.
? For a new or an empty sharded collection, MongoDB uses the results of the first stage of the Map-reduce
Operation to create the initial chunks distributed among the shards.
? MONGOs dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During
The post-processing, each shard would pull the results for its own chunks from the other shards, run the final
Reduce/finalize, and write locally to the output collection.
Map Reduce Concurrency
Return states with populations above Million
SELECT State, SUM (pop) as Totalpop
From ZipCodes
GROUP by State
Having Totalpop >= (10*1000*1000)
Db.zipcodes.aggregate ({$group:
{_id: "$state",
Totalpop: {$sum: "$pop"}}},
{$match: {totalpop: {$gte: 10*1000*1000}}})
Return Average City Population by state
Averages are based on stat and city.
Db.zipcodes.aggregate ([
{$group: {_id: {state: ' $state ', City: ' $city '}, Pop: {$sum: ' $pop '}}},
{$group: {_id: "$_id.state", Avgcitypop: {$avg: "$pop"}}}
] )
The corresponding SQL statement
Select State, AVG (Sum_pop) from (select State, City, sum (POPs) as Sum_pop from ZipCodes group by city, state) as Temp
GROUP BY Temp.state
Return largest and smallest Cities by state
Returns the maximum. Minimum value
Db.zipcodes.aggregate ({$group:
{_id: {state: ' $state ', City: ' $city '},
Pop: {$sum: "$pop"}}},
{$sort: {pop:1}},
{$group:
{_id: "$_id.state",
Biggestcity: {$last: "$_id.city"},
Biggestpop:
{$last: "$pop"},
Smallestcity: {$first: "$_id.city"},
Smallestpop: {$first: "$pop"}}},
The following $project is optional, and
Modifies the output format.
{$project:
{_id:0,
State: "$_id",
Biggestcity: {name: "$biggestCity", Pop: "$biggestPop"},
Smallestcity: {name: "$smallestCity", Pop: "$smallestPop"}})
Return the Five most Common "likes"
Db.users.aggregate (
[
{$unwind: "$likes"},//Here you have to comment:? The $unwind operator separates each value in the likes array, and creates a new version of the source document for every E Lement in the array. Split the score group with ...
{$group: {_id: "$likes", Number: {$sum: 1}},
{$sort: {number:-1}},
{$limit: 5}
]
)
Examples of map-reduce:
Data:
{
_id:objectid ("50a8240b927d5d8b5891743c"),
cust_id: "Abc123",
Ord_date:new Date ("Oct 04, 2012"),
Status: ' A ',
PRICE:25,
Items: [{sku: "MMM", Qty:5, price:2.5},
{SKU: "nnn", Qty:5, price:2.5}]
}
Return the total price Per Customer
Map method:
var mapFunction1 = function () {
Emit (this.cust_id, this.price);
};
Reduce method:
var reduceFunction1 = function (Keycustid, valuesprices) {
Return Array.sum (valuesprices);
};
Combined use:
Db.orders.mapReduce (
MapFunction1,
ReduceFunction1,
{out: "Map_reduce_example"}//Specifies output to "Map_reduce_example" in this collection
)
========
To be Continued ...
Document:
Http://pan.baidu.com/s/1jiFOM
MongoDB document Aggregate Chapter Read the notes