Heard that the project inside aggregation use more, then specifically for this a lot of practice.
The basic operations include:
• $project-You can extract fields from subdocuments, and you can rename fields
• $match-the ability to find
• $limit-accepts a number n, returning the first n documents in the result set.
• $skip-accepts a number n, discarding the first n documents in the result set. Low efficiency, still traversing the top n documents.
• $unwind-You can divide a document that contains an array into multiple, such as your document has a number of fields A, a has 10 elements, then after $unwind processing will produce 10 documents, these documents only field A is different
• $group-Statistics operation, also provides a series of child commands
– $avg, $sum ...
• $sort-Sort
Python article
Experiment one, student data statistics
1. Generate Student Data:
#!/usr/bin/env python # coding=utf-8 from Pymongo import mongoclient from random import ra Ndint name1 = ["Yang", "Li", "Zhou"] name2 = ["Chao" "Hao", "Gao", "Qi Gao", "Hao Hao", "Gao Gao", "Cha O Hao ", Ji Gao", "Ji Hao", "Li Gao", "Li Hao",] provinces = ["Guang Dong", "Guang Xi", "Shan Dong", "sh "A XI", "he nan"] client = mongoclient (' localhost ', 27017) db = client.student sm = db.smessage sm.remove () to I in RA
Nge (1): name = Name1[randint (0, 2)] + name2[randint (0)] province = Provinces[randint (0, 4)] New_student = { "Name": Name, "Age": Randint (1), "province": Province, "subject": [{"Name": "Chinese", "score"
: Randint (0)}, {"name": "Math", "Score": Randint (0)}, {"Name": "中文版", "Score": Randint (0, 100)}, {"Name": "Chemic", "Score": Randint (0)},]} print new_student sm.insert_one (new_student) Print Sm.cou NT ()
OK, now there are 100 student data in the database.
Now I want to get the average age of Guangdong students, in the MONGO console input:
It would be easier to think of the average age of all provinces:
Db.smessage.aggregate (
{$match: {province: "Guang Dong"}}
)
{"_id": "Guang XI", "Age": 15.19047619047619}
{"_id": "Guang Dong", "Age": 16.05263157894737}
{"_id": "Shan Dong", "Age": 17.44}
{' _id ': ' He nan ', ' Age ':}
{"_id": "Shan Xi", "Age": 16.41176470588235}
If you want to get the average score of all subjects in Guangdong province:
db.smessage.aggregate {
$match: {province: "Guang Dong"}},
{$unwind: "$subject"},
{$group: {_id: { Province: "$province", Sujname: "$subject. Name"}, per:{$avg: "$subject. Score"}}
)
Plus sort:
db.smessage.aggregate {
$match: {province: "Guang Dong"}},
{$unwind: "$subject"},
{$group: {_id: { Province: "$province", Sujname: "$subject. Name"}, per:{$avg: "$subject. Score"}}},
{$sort: {per:1}}
)
Experiment two, looking for the post water king
With a collection of magazine articles, you might want to find the author who publishes the most articles. Suppose that each article is saved as a document in MongoDB.
1. Inserting data
#!/usr/bin/env Python
# coding=utf-8 from
Pymongo import mongoclient from
random import randint
name = [
' Yangx ',
' yxxx ',
' laok ',
' KKK ',
' ji ',
' Gaoxiao ',
' laoj ', '
Meimei ',
' JJ ',
' Manwang ',
]
title = [
' 123 ', ' 321 ', ' ', ', ', '
, ', '
aaa ',
' BBB ',
' CCC ',
' sss ',
' aaaa ',
' CCCC ',
]
client = mongoclient (' localhost ', 30999)
db = Client.test
bbs = Db.bbs
bbs.remove ()
for I in range (1, 10000):
na = Name[randint (0, 9)]
ti = ti Tle[randint (0, 9)]
Newcard = {
' author ': NA,
' title ': Ti,
}
bbs.insert_one (Newcard)
Print Bbs.count ()
Now we have 10,000 piece of article data.
2, with $project the author field projection out
{' $project ': {' author ': 1}}
This syntax is similar to the field selector in a query: You can select the fields you want to project by specifying "FieldName": 1, or exclude unwanted fields by specifying "FieldName": 0.
After performing this "$project" operation, each document in the result set is represented in the form of {"_id": ID, "author": "AuthorName"}. These results will only exist in memory and will not be written to disk.
3. Group the author names with group
{' group ': {' _id ': ' $author ', ' count ': {' $sum ': 1}}}
This will sort the author by name, and each time an author's name appears, it will add 1 to the author's count.
Here you first specify the field "author" that you want to group. This is specified by the "_id": "$author". Think of this as: after this operation is done, each author corresponds to only one result document, so "author" becomes the unique identifier ("_id") of the document.
The second field means to add 1 to the Count field for each document in the group. Note that there is no "count" field in the newly added document; This "$group" creates a new field.
After you perform this step, each document in the result set is structured like this: {"_id": "AuthorName", "Count": Articlecount}.
4, sorted by sort
{' $sort ': {' count ':-1}}
This action arranges the documents in the result set in descending order according to the Count field.
5. Limit results to top 5 documents
This action restricts the final return result to the first 5 documents in the current result.
When you are actually running in MongoDB, you want to pass these actions to the aggregate () function individually:
> db.articles.aggregate ({"$project": {"author": 1}},
...) {"$group": {"_id": "$author", "Count": {"$sum": 1}}},
... {' $sort ': {' count ':-1}},
... {"$limit": 5}
... )
Aggregate () returns an array of documents, with the contents of the 5 authors who publish the most articles.
{"_id": "Yangx", "Count": 1028}
{"_id": "Laok", "Count": 1027}
{"_id": "KKK", "Count": 1012}
{"_id": "Yxxx", "Count": 1010}
{"_id": "Ji", "Count": 1007}
I built some data in db (randomly generated when data is available), without indexing, the document structure is as follows:
Document structure:
{"
_id": ObjectId ("509944545"),
"province": "Hainan",
"age": "
Subjects": [
{
"name": "Language", c7/> "Score":
},
{
"name": "Math",
"score":
},
{
"name": "English",
"score": 35
}
],
"name": "Liu Yu"
}
The next two features are implemented:
- Statistics on average age of Shanghai students
- Statistics of the average scores of each section in each province
The next one by one ways.
Statistics on average age of Shanghai students
From this requirement, there are several steps to implement the function: 1. Find out the students in Shanghai. 2. Average age of statistics (you can, of course, figure out the average of all the provinces and find out about Shanghai). So the idea is clear.
First $match, take out the students in Shanghai
{$match: {' Province ': ' Shanghai '}}
Then use $group to count the average age
{$group: {_id: ' $province ', $avg: ' $age '}}
$avg is a $group subcommand for averaging, $sum, $max ....
The above two commands are equivalent to
Select Province, AVG (age) from
student
where province = ' Shanghai '
Group by province
Here is the Java code
Mongo m = new Mongo ("localhost", 27017);
DB db = M.getdb ("test");
Dbcollection coll = db.getcollection ("student");
/* Create $match that acts as query*/
dbobject match = new Basicdbobject ("$match", New Basicdbobject ("Province", "Shanghai"));
/* Group Operation * *
dbobject groupfields = new Basicdbobject ("_id", "$province");
Groupfields.put ("Avgage", New Basicdbobject ("$avg", "$age"));
DBObject Group = new Basicdbobject ("$group", groupfields);
/* View Group Results */
aggregationoutput output = coll.aggregate (match, group);//Execute Aggregation command
System.out.println (Output.getcommandresult ());
Output results:
{"serverused": "localhost/127.0.0.1:27017", "Result
": [
{"_id": "Shanghai", "Avgage": 32.09375}
],
"OK" : 1.0
}
So the project is over, look at another demand.
Statistics of the average scores of each section in each province
First of all, more database document structure, subjects is an array form, you need to ' split ' before the statistics
The main processing steps are as follows:
1. First use $unwind to split the array 2. According to province, subject sublet and ask for the average score of each subject
$unwind Split Array
According to province, subject group, and find the average score
{$group: {
_id:{
subjname: ' $subjects. Name ', //Specify one of the group fields Subjects.name, and rename to Subjname
Province: ' $province ' //Specify one of the group Fields province, and rename to province (unchanged)
},
avgscore:{
$avg: "$ Subjects.score " //Subjects.score average
}
}
The Java code is as follows:
Mongo m = new Mongo ("localhost", 27017);
DB db = M.getdb ("test");
Dbcollection coll = db.getcollection ("student");
/* Create $unwind operation for cutting fractions Group *
/DBObject unwind = new Basicdbobject ("$unwind", "$subjects");
/* Group operation *
/DBObject groupfields = new Basicdbobject ("_id", New Basicdbobject ("Subjname", "$subjects. Name"). Append ("Province", "$province"));
Groupfields.put ("Avgscore", New Basicdbobject ("$avg", "$subjects. Scores"));
DBObject Group = new Basicdbobject ("$group", groupfields);
/* View Group Results */
aggregationoutput output = coll.aggregate (unwind, group);//Execute Aggregation command
System.out.println (Output.getcommandresult ());
Output results
{"serverused": "localhost/127.0.0.1:27017", "Result
": [
{"_id": {"subjname": "English", "province": "Hainan"}, " Avgscore ": 58.1},
{" _id ": {" subjname ":" Math "," province ":" Hainan "}," Avgscore ": 60.485},
{" _id ": {" Subjn Ame ":" Chinese "," province ":" Jiangxi "}," Avgscore ": 55.538},
{" _id ": {" subjname ":" English "," province ":" Shanghai "}," Avgsco Re ": 57.65625},
{" _id ": {" subjname ":" Math "," province ":" Guangdong "}," Avgscore ": 56.690},
{" _id ": {" Subjnam E ":" Mathematics "," province ":" Shanghai "}," Avgscore ": 55.671875},
{" _id ": {" subjname ":" Language "," province ":" Shanghai "}," AVGSC Ore ": 56.734375},
{" _id ": {" subjname ":" English "," Province ":" Yunnan "}," Avgscore ": 55.7301},
...
.
.
" OK ": 1.0
}
This concludes the statistics .... Wait, it seems a bit too rough, although the statistics, but not at all, the same province of the subjects are not together. 囧
The next step is to strengthen,
Feeder tasks: The same province of the subject scores together (that is, expect ' province ': ' xxxxx ', avgscores:[{' xxx ': xxx}, ...] in this form)
To do one thing, on the basis of the previous statistical results, first use $project to rub the average score and the results together, that is, like the following
{"Subjinfo": {"subjname": "English", "Avgscores": 58.1}, "province": "Hainan"}
Press the province group to push the average of each subject to one piece, as follows:
$project Refactoring Group Results
{$project: {province: "$_id.province", subjinfo:{"Subjname": "$_id.subjname", "Avgscore": "$AvgScore"}}
$ group again using group
{$group: {_id: ' $province ', avginfo:{$push: ' $subjinfo '}}}
The Java code is as follows:
Mongo m = new Mongo ("localhost", 27017);
DB db = M.getdb ("test");
Dbcollection coll = db.getcollection ("student");
/* Create $unwind operation for cutting fractions Group */dbobject unwind = new Basicdbobject ("$unwind", "$subjects"); /* Group Operation */DBObject GroupFields = new Basicdbobject ("_id", New Basicdbobject ("Subjname", "$subjects. Name"). Append ("
Province "," $province "));
Groupfields.put ("Avgscore", New Basicdbobject ("$avg", "$subjects. Scores"));
DBObject Group = new Basicdbobject ("$group", groupfields);
/* Reshape Group result*/dbobject projectfields = new Basicdbobject ();
Projectfields.put ("Province", "$_id.province");
Projectfields.put ("Subjinfo", New Basicdbobject ("Subjname", "$_id.subjname"). Append ("Avgscore", "$AvgScore"));
DBObject project = new Basicdbobject ("$project", projectfields);
/* Push the results together/* dbobject groupagainfields = new Basicdbobject ("_id", "$province");
Groupagainfields.put ("Avginfo", New Basicdbobject ("$push", "$subjinfo")); DBObject Reshapegroup = new BasicdBobject ("$group", groupagainfields);
/* View Group results */aggregationoutput output = coll.aggregate (unwind, group, Project, Reshapegroup);
System.out.println (Output.getcommandresult ());
The results are as follows:
{"serverused": "localhost/127.0.0.1:27017", "Result
": [
{"_id": "Liaoning", "Avginfo": [{"Subjname": "Mathematics", " Avgscore ": 56.46666666666667}, {" Subjname ":" English "," Avgscore ": 52.093333333333334}, {" Subjname ":" Chinese "," Avgscore " : 50.53333333333333}]},
{"_id": "Sichuan", "avginfo": [{"Subjname": "Math", "Avgscore": 52.72727272727273}, {"Su Bjname ": English", "Avgscore": 55.90909090909091}, {"Subjname": "Language", "Avgscore": 57.59090909090909}]},
{"_id": "Chongqing", "avginfo": [{"Subjname": "Chinese", "Avgscore": 56.077922077922075}, {"Subjname": "English", "Avgscore": 54.84415 584415584}, {"Subjname": "Math", "Avgscore": 55.33766233766234}]},
{"_id": "Anhui", "Avginfo": [{"Subjname": "] English "," Avgscore ": 55.458333333333336}, {" Subjname ":" Math "," Avgscore ": 54.47222222222222}, {" Subjname ":" Language "," AV Gscore ": 52.80555555555556}]}
", "OK": 1.0}