Learn MongoDB 11: MongoDB Aggregation (Aggregation Pipeline Basics) (iii)
June 09, 2016 10:47:10
Hits: 15320
I. INTRODUCTION of Aggregate
db.collection.aggregate()是基于数据处理的聚合管道,每个文档通过一个由多个阶段(stage)组成的管道,可以对每个阶段的管道进行分组、过滤等功能,然后经过一系列的处理,输出相应的结果。 图来自https://docs.mongodb.com/manual/aggregation/ 官方网 我们通过这张图,可以清晰的了解Aggregate处理的过程 1、db.collection.aggregate()可以多个管道,能方便的进行数据的处理。 2、db.collection.aggregate()使用了MongoDB内置的原生操作,聚合效率非常高,支持类似于SQL Group By操作的功能,而不再需要用户编写自定义的JavaScript例程。 3、 每个阶段管道限制为100MB的内存。如果一个节点管道超过这个极限,MongoDB将产生一个错误。为了能够在处理大型数据集,可以设置allowDiskUse为true来在聚合管道节点把数据写入临时文件。这样就可以解决100MB的内存的限制。 4、db.collection.aggregate()可以作用在分片集合,但结果不能输在分片集合,MapReduce可以 作用在分片集合,结果也可以输在分片集合。5、db.collection.aggregate()方法可以返回一个指针(cursor),数据放在内存中,直接操作。跟Mongo shell 一样指针操作。 6、db.collection.aggregate()输出的结果只能保存在一个文档中,BSON Document大小限制为16M。可以通过返回指针解决,版本2.6中后面:DB.collect.aggregate()方法返回一个指针,可以返回任何结果集的大小。
Second, aggregate syntax:
Db.collection.aggregate (Pipeline, Options) "Pipeline $group parameter" the pipeline type is the array syntax: Db.collection.aggregate ([{ <stage>}, ...] $group: Groups The documents in the collection to be used for statistical results, $group first group the data according to key. $group syntax: {$group: {_id: <expression>, <field1>: {<accumulator1>: <expression1>}, ...}} _ID is the key $group to be grouped: the data that can be grouped performs the following expression calculation: $sum: Calculates the sum. $avg: Calculates the average. $min: Gets the minimum value for all documents in the collection according to the grouping. $max: Gets the maximum value for all documents in the collection according to the grouping. $push: Adds the value of the specified expression to an array. $addToSet: Adds the value of an expression to a collection (no duplicate values). $first: Returns the first document for each group, if one is sorted, by sort, if not by default in the stored order. $last: Returns the last document for each group, if one is sorted, by sort, if not by default in the stored order. We can use aggregation pipeline some of the same as SQL usage, we can clearly how to use pipeline SQL $av G avg $min min $max Max $group Group BY $sort ORDER by $limit Limit $sum sum () $sum Count ()
Three, pipeline $group a simple example
[SQL] View plain copy
Db.items.insert ([
{
"Quantity": 2,
"Price": 5.0,
"Pnumber": "P003",
},{
"Quantity": 2,
"Price": 8.0,
"Pnumber": "P002"
},{
"Quantity": 1,
"Price": 4.0,
"Pnumber": "P002"
},{
"Quantity": 2,
"Price": 4.0,
"Pnumber": "P001"
},{
"Quantity": 4,
"Price": 10.0,
"Pnumber": "P003"
},{
"Quantity": 10,
"Price": 20.0,
"Pnumber": "P001"
},{
"Quantity": 10,
"Price": 20.0,
"Pnumber": "P003"
},{
"Quantity": 5,
"Price": 10.0,
"Pnumber": "P002"
}
])
【$group】 1、将集合中的文档分组,可用于统计结果,$group首先将数据根据key进行分组。 _id 是要进行分组的key,如果_id为null 相当于select count(*) from table
"$sum"
1、 我们统计items有几条,相当于SQL: select count(1) as count from items
[SQL] View plain copy
Db.items.count ()
8
Db.items.aggregate ([{$group: {_id:null,count:{$sum: 1}}])
{"_id": null, "Count": 8}
2、我们统计一下数量,相当于SQL: select sum(quantity) as total from items
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id:null,total:{$sum: "$quantity"}}])
{"_id": null, "Total": 36}
3, we group by the product type, then in the statistics of the number of sell, equivalent to Sql:select sum (quantity) as total from the items group by Pnumber
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', total:{$sum: ' $quantity '}}])
{"_id": "P001", "Total": 12}
{"_id": "P002", "Total": 8}
{"_id": "P003", "Total": 16}
"$min, $max"
1, we group by the same product type, and then query the same product type to sell the most order details, equivalent to Sql:select max (quantity) as quantity from the items group by Pnumber
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', max:{$max: ' $quantity '}}])
{"_id": "P001", "Max": 10}
{"_id": "P002", "Max": 5}
{"_id": "P003", "Max": 10}
2、我们通过相同的产品类型来进行分组,然后查询相同产品类型卖出最多的订单详情 ,相当于SQL:select min(quantity) as quantity from items group by pnumber
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', min:{$min: ' $quantity '}}])
{"_id": "P001", "Min": 2}
{"_id": "P002", "Min": 1}
{"_id": "P003", "Min": 2}
3. We group by the same product type, count the quantity of each product, and then get the maximum quantity, equivalent to Sql:select max (t.total) from (select SUM (quantity) as full from the items group by Pnumber) T
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', total:{$sum: ' $quantity '}}])
{"_id": "P001", "Total": 12}
{"_id": "P002", "Total": 8}
{"_id": "P003", "Total": 16}
Db.items.aggregate ([{$group: {_id: $pnumber, total:{$sum: ' $quantity '}}},{$group: {_id:null,max:{$max: ' $total '}}} ])
{"_id": null, "Max": 16}
"$avg"
先根据$group,在计算平均值,只会针对数字的进行计算,会对字符串忽略1、我们通过相同的产品类型来进行分组,然后查询每个订单详情相同产品类型卖出的平均价格,相当于SQL:select avg(price) as price from items group by pnumber
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', price:{$avg: ' $price '}}])
{"_id": "P001", "Price": 12}
{"_id": "P002", "Price": 7.333333333333333}
{"_id": "P003", "Price": 11.666666666666666}
"$push"
将指定的表达式的值添加到一个数组中,这个值不要超过16M,不然会出现错误1、我们通过相同的产品类型来进行分组,然后查询每个相同产品卖出的数量放在数组里面
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', quantitys:{$push: ' $quantity '}}])
{"_id": "P001", "Quantitys": [2, 10]}
{"_id": "P002", "Quantitys": [2, 1, 5]}
{"_id": "P003", "Quantitys": [2, 4, 10]}
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: "$pnumber", quantitys:{$push: {quantity: "$quantity", Price: "$price"}}}])
{"_id": "P001", "Quantitys": [{"Quantity": 2, "Price": 4}, {"Quantity": Ten, "Price": 20}]}
{"_id": "P002", "Quantitys": [{"Quantity": 2, "Price": 8}, {"Quantity": 1, "Price": 4}, {"Quantity": 5, "pric E ": 10}]}
{"_id": "P003", "Quantitys": [{"Quantity": 2, "Price": 5}, {"Quantity": 4, "Price": ten}, {"Quantity": Ten, "PR Ice ": 20}]}
"$addToSet"
将表达式的值添加到一个数组中(无重复值),这个值不要超过16M,不然会出现错误
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', quantitys:{$addToSet: ' $quantity '}}])
{"_id": "P001", "Quantitys": [10, 2]}
{"_id": "P002", "Quantitys": [5, 1, 2]}
{"_id": "P003", "Quantitys": [10, 4, 2]}
"$first, $last"
$first: Returns the first document for each group, if one is sorted, by sort, if not by default in the stored order.
$last:返回每组最后一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的最后个文档。
[SQL] View plain copy
Db.items.aggregate ([{$group: {_id: ' $pnumber ', quantityfrist:{$first: ' $quantity '}}])
{"_id": "P001", "Quantityfrist": 2}
{"_id": "P002", "Quantityfrist": 2}
{"_id": "P003", "Quantityfrist": 2}
我们这篇主要介绍了aggregate pipeline的$group 基础操作,后续介绍了 pipeline其他参数和options使用
MongoDB Aggregation (Aggregation pipeline on the basic article