A powerful statistical framework in MongoDB aggregation use instance analysis

A powerful statistical framework in MongoDB aggregation use instance analysis _mongodb

Last Update:2017-01-18 Source: Internet

Author: User

Tags mongoclient mongodb

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Heard that the project inside aggregation use more, then specifically for this a lot of practice.

The basic operations include:

• $project-You can extract fields from subdocuments, and you can rename fields

• $match-the ability to find

• $limit-accepts a number n, returning the first n documents in the result set.

• $skip-accepts a number n, discarding the first n documents in the result set. Low efficiency, still traversing the top n documents.

• $unwind-You can divide a document that contains an array into multiple, such as your document has a number of fields A, a has 10 elements, then after $unwind processing will produce 10 documents, these documents only field A is different

• $group-Statistics operation, also provides a series of child commands

– $avg, $sum ...

• $sort-Sort

Python article
Experiment one, student data statistics
1. Generate Student Data:

 #!/usr/bin/env python # coding=utf-8 from Pymongo import mongoclient from random import ra Ndint name1 = ["Yang", "Li", "Zhou"] name2 = ["Chao" "Hao", "Gao", "Qi Gao", "Hao Hao", "Gao Gao", "Cha O Hao ", Ji Gao", "Ji Hao", "Li Gao", "Li Hao",] provinces = ["Guang Dong", "Guang Xi", "Shan Dong", "sh "A XI", "he nan"] client = mongoclient (' localhost ', 27017) db = client.student sm = db.smessage sm.remove () to I in RA 
    Nge (1): name = Name1[randint (0, 2)] + name2[randint (0)] province = Provinces[randint (0, 4)] New_student = { "Name": Name, "Age": Randint (1), "province": Province, "subject": [{"Name": "Chinese", "score" 
      : Randint (0)}, {"name": "Math", "Score": Randint (0)}, {"Name": "中文版", "Score": Randint (0, 100)}, {"Name": "Chemic", "Score": Randint (0)},]} print new_student sm.insert_one (new_student) Print Sm.cou NT ()

OK, now there are 100 student data in the database.

Now I want to get the average age of Guangdong students, in the MONGO console input:

It would be easier to think of the average age of all provinces:

Db.smessage.aggregate (
{$match: {province: "Guang Dong"}}
)

{"_id": "Guang XI", "Age": 15.19047619047619}
{"_id": "Guang Dong", "Age": 16.05263157894737}
{"_id": "Shan Dong", "Age": 17.44}
{' _id ': ' He nan ', ' Age ':}
{"_id": "Shan Xi", "Age": 16.41176470588235}

If you want to get the average score of all subjects in Guangdong province:

db.smessage.aggregate {
$match: {province: "Guang Dong"}},
{$unwind: "$subject"},
{$group: {_id: { Province: "$province", Sujname: "$subject. Name"}, per:{$avg: "$subject. Score"}}
)

Plus sort:

db.smessage.aggregate {
$match: {province: "Guang Dong"}},
{$unwind: "$subject"},
{$group: {_id: { Province: "$province", Sujname: "$subject. Name"}, per:{$avg: "$subject. Score"}}},
{$sort: {per:1}}
)

Experiment two, looking for the post water king
With a collection of magazine articles, you might want to find the author who publishes the most articles. Suppose that each article is saved as a document in MongoDB.

1. Inserting data

#!/usr/bin/env Python
# coding=utf-8 from
Pymongo import mongoclient from
random import randint


name = [
  ' Yangx ',
  ' yxxx ',
  ' laok ',
  ' KKK ',
  ' ji ',
  ' Gaoxiao ',
  ' laoj ', '
  Meimei ',
  ' JJ ',
  ' Manwang ',
]

title = [
  ' 123 ', ' 321 ', ' ', ', ', '
  , ', '
  aaa ',
  ' BBB ',
  ' CCC ',
  ' sss ',
  ' aaaa ',
  ' CCCC ',
]

client = mongoclient (' localhost ', 30999)
db = Client.test
bbs = Db.bbs
bbs.remove ()
for I in range (1, 10000):
  na = Name[randint (0, 9)]
  ti = ti Tle[randint (0, 9)]
  Newcard = {
    ' author ': NA,
    ' title ': Ti,
  }
  bbs.insert_one (Newcard)

Print Bbs.count ()

Now we have 10,000 piece of article data.

2, with $project the author field projection out

{' $project ': {' author ': 1}}

This syntax is similar to the field selector in a query: You can select the fields you want to project by specifying "FieldName": 1, or exclude unwanted fields by specifying "FieldName": 0.

After performing this "$project" operation, each document in the result set is represented in the form of {"_id": ID, "author": "AuthorName"}. These results will only exist in memory and will not be written to disk.

3. Group the author names with group

{' group ': {' _id ': ' $author ', ' count ': {' $sum ': 1}}}

This will sort the author by name, and each time an author's name appears, it will add 1 to the author's count.

Here you first specify the field "author" that you want to group. This is specified by the "_id": "$author". Think of this as: after this operation is done, each author corresponds to only one result document, so "author" becomes the unique identifier ("_id") of the document.

The second field means to add 1 to the Count field for each document in the group. Note that there is no "count" field in the newly added document; This "$group" creates a new field.

After you perform this step, each document in the result set is structured like this: {"_id": "AuthorName", "Count": Articlecount}.

4, sorted by sort

{' $sort ': {' count ':-1}}

This action arranges the documents in the result set in descending order according to the Count field.

5. Limit results to top 5 documents

{"$limit": 5}

This action restricts the final return result to the first 5 documents in the current result.
When you are actually running in MongoDB, you want to pass these actions to the aggregate () function individually:

> db.articles.aggregate ({"$project": {"author": 1}},
...) {"$group": {"_id": "$author", "Count": {"$sum": 1}}},
... {' $sort ': {' count ':-1}},
... {"$limit": 5}
... )

Aggregate () returns an array of documents, with the contents of the 5 authors who publish the most articles.

{"_id": "Yangx", "Count": 1028}
{"_id": "Laok", "Count": 1027}
{"_id": "KKK", "Count": 1012}
{"_id": "Yxxx", "Count": 1010}
{"_id": "Ji", "Count": 1007}

Java Chapter

I built some data in db (randomly generated when data is available), without indexing, the document structure is as follows:

Document structure:

 {"
  _id": ObjectId ("509944545"),
  "province": "Hainan",
  "age": "
  Subjects": [
  {
  "name": "Language", c7/> "Score":
  },
  {
  "name": "Math",
  "score":
  },
  {
  "name": "English",
  "score": 35
  }
   ],
  "name": "Liu Yu"
 }

The next two features are implemented:

Statistics on average age of Shanghai students
Statistics of the average scores of each section in each province

The next one by one ways.

Statistics on average age of Shanghai students

From this requirement, there are several steps to implement the function: 1. Find out the students in Shanghai. 2. Average age of statistics (you can, of course, figure out the average of all the provinces and find out about Shanghai). So the idea is clear.

First $match, take out the students in Shanghai

{$match: {' Province ': ' Shanghai '}}

Then use $group to count the average age

{$group: {_id: ' $province ', $avg: ' $age '}}

$avg is a $group subcommand for averaging, $sum, $max ....
The above two commands are equivalent to

Select Province, AVG (age) from 
 student 
 where province = ' Shanghai '
 Group by province

Here is the Java code

Mongo m = new Mongo ("localhost", 27017);
 DB db = M.getdb ("test");
 Dbcollection coll = db.getcollection ("student");
 
 /* Create $match that acts as query*/
 dbobject match = new Basicdbobject ("$match", New Basicdbobject ("Province", "Shanghai"));
 
 /* Group Operation * *
 dbobject groupfields = new Basicdbobject ("_id", "$province");
 Groupfields.put ("Avgage", New Basicdbobject ("$avg", "$age"));
 DBObject Group = new Basicdbobject ("$group", groupfields);
 
 /* View Group Results */
 aggregationoutput output = coll.aggregate (match, group);//Execute Aggregation command
 System.out.println (Output.getcommandresult ());

Output results:

{"serverused": "localhost/127.0.0.1:27017", "Result    
 ": [ 
  {"_id": "Shanghai", "Avgage": 32.09375}
  ],     
  "OK" : 1.0
 }

So the project is over, look at another demand.

Statistics of the average scores of each section in each province

First of all, more database document structure, subjects is an array form, you need to ' split ' before the statistics

The main processing steps are as follows:

1. First use $unwind to split the array 2. According to province, subject sublet and ask for the average score of each subject

$unwind Split Array

{$unwind: ' $subjects '}

According to province, subject group, and find the average score

{$group: {
   _id:{
     subjname: ' $subjects. Name ',  //Specify one of the group fields Subjects.name, and rename to Subjname
     Province: ' $province '     //Specify one of the group Fields province, and rename to province (unchanged)
   },
   avgscore:{
    $avg: "$ Subjects.score "    //Subjects.score average
   }
 }

The Java code is as follows:

Mongo m = new Mongo ("localhost", 27017);
 DB db = M.getdb ("test");
 Dbcollection coll = db.getcollection ("student");
 
 /* Create $unwind operation for cutting fractions Group *
 /DBObject unwind = new Basicdbobject ("$unwind", "$subjects");
 
 /* Group operation *
 /DBObject groupfields = new Basicdbobject ("_id", New Basicdbobject ("Subjname", "$subjects. Name"). Append ("Province", "$province"));
 Groupfields.put ("Avgscore", New Basicdbobject ("$avg", "$subjects. Scores"));
 DBObject Group = new Basicdbobject ("$group", groupfields);
 
 /* View Group Results */
 aggregationoutput output = coll.aggregate (unwind, group);//Execute Aggregation command
 System.out.println (Output.getcommandresult ());

Output results

{"serverused": "localhost/127.0.0.1:27017", "Result 
  ": [ 
   {"_id": {"subjname": "English", "province": "Hainan"}, " Avgscore ": 58.1}, 
   {" _id ": {" subjname ":" Math "," province ":" Hainan "}," Avgscore ": 60.485},
   {" _id ": {" Subjn Ame ":" Chinese "," province ":" Jiangxi "}," Avgscore ": 55.538}, 
   {" _id ": {" subjname ":" English "," province ":" Shanghai "}," Avgsco Re ": 57.65625}, 
   {" _id ": {" subjname ":" Math "," province ":" Guangdong "}," Avgscore ": 56.690}, 
   {" _id ": {" Subjnam E ":" Mathematics "," province ":" Shanghai "}," Avgscore ": 55.671875},
   {" _id ": {" subjname ":" Language "," province ":" Shanghai "}," AVGSC Ore ": 56.734375}, 
   {" _id ": {" subjname ":" English "," Province ":" Yunnan "}," Avgscore ": 55.7301},
   ...
   .
   .
   " OK ": 1.0
 }

This concludes the statistics .... Wait, it seems a bit too rough, although the statistics, but not at all, the same province of the subjects are not together. 囧

The next step is to strengthen,

Feeder tasks: The same province of the subject scores together (that is, expect ' province ': ' xxxxx ', avgscores:[{' xxx ': xxx}, ...] in this form)

To do one thing, on the basis of the previous statistical results, first use $project to rub the average score and the results together, that is, like the following

{"Subjinfo": {"subjname": "English", "Avgscores": 58.1}, "province": "Hainan"}

Press the province group to push the average of each subject to one piece, as follows:

$project Refactoring Group Results

{$project: {province: "$_id.province", subjinfo:{"Subjname": "$_id.subjname", "Avgscore": "$AvgScore"}}

$ group again using group

{$group: {_id: ' $province ', avginfo:{$push: ' $subjinfo '}}}

The Java code is as follows:

Mongo m = new Mongo ("localhost", 27017);
DB db = M.getdb ("test");
       
Dbcollection coll = db.getcollection ("student");
       
/* Create $unwind operation for cutting fractions Group */dbobject unwind = new Basicdbobject ("$unwind", "$subjects"); /* Group Operation */DBObject GroupFields = new Basicdbobject ("_id", New Basicdbobject ("Subjname", "$subjects. Name"). Append ("
Province "," $province "));
Groupfields.put ("Avgscore", New Basicdbobject ("$avg", "$subjects. Scores"));
       
DBObject Group = new Basicdbobject ("$group", groupfields);
/* Reshape Group result*/dbobject projectfields = new Basicdbobject ();
Projectfields.put ("Province", "$_id.province");
Projectfields.put ("Subjinfo", New Basicdbobject ("Subjname", "$_id.subjname"). Append ("Avgscore", "$AvgScore"));
       
DBObject project = new Basicdbobject ("$project", projectfields);
/* Push the results together/* dbobject groupagainfields = new Basicdbobject ("_id", "$province");
Groupagainfields.put ("Avginfo", New Basicdbobject ("$push", "$subjinfo")); DBObject Reshapegroup = new BasicdBobject ("$group", groupagainfields);
/* View Group results */aggregationoutput output = coll.aggregate (unwind, group, Project, Reshapegroup);
 System.out.println (Output.getcommandresult ());

The results are as follows:

{"serverused": "localhost/127.0.0.1:27017", "Result 
 ": [ 
    {"_id": "Liaoning", "Avginfo": [{"Subjname": "Mathematics", "  Avgscore ": 56.46666666666667}, {" Subjname ":" English "," Avgscore ": 52.093333333333334}, {" Subjname ":" Chinese "," Avgscore " : 50.53333333333333}]}, 
    {"_id": "Sichuan", "avginfo": [{"Subjname": "Math", "Avgscore": 52.72727272727273}, {"Su  Bjname ": English", "Avgscore": 55.90909090909091}, {"Subjname": "Language", "Avgscore": 57.59090909090909}]}, 
    {"_id": "Chongqing", "avginfo": [{"Subjname": "Chinese", "Avgscore": 56.077922077922075}, {"Subjname": "English", "Avgscore": 54.84415 584415584}, {"Subjname": "Math", "Avgscore": 55.33766233766234}]}, 
    {"_id": "Anhui", "Avginfo": [{"Subjname": "] English "," Avgscore ": 55.458333333333336}, {" Subjname ":" Math "," Avgscore ": 54.47222222222222}, {" Subjname ":" Language "," AV Gscore ": 52.80555555555556}]}
  ", "OK": 1.0}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More