Introduction to MongoDB's new data statistics framework

Last Update:2014-06-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CurrentMongoDBComplex dataStatisticsMapReduce is required for computing, including group by queries that are commonly used in SQL. It is troublesome to write a reduce statement. In MongoDB2.1, a new data statistics computing framework will be introduced, making it easier for users to perform statistics operations.

Next we will look at several new operators:

$ Match

$ Match is used to filter data by setting a condition. For example:

db.runCommand({ aggregate : "article", pipeline : [    { $match : { author : "dave" } }]});

This is equivalent to filtering records in the collection of articles. The filtering condition is that the author attribute value is dave, which is equivalent to the common find command, for example:

> db.article.find({ author : "dave" });

So, what is the purpose of this command? Unlike find, the result of find is directly returned as the final data, while $ match is only a part of pipeline. It filters the result data and can perform the next-level statistical operation.

$ Project

The $ project command is used to set data filtering fields, just like the fields required by select in SQL. Example:

db.runCommand({ aggregate : "article", pipeline : [    { $match : { author : "dave" } },    { $project : {        _id : 0,author : 1,        tags : 1    }}]});

In the preceding example, the author and tags fields of all the records whose author is dave are obtained. (_ Id: 0 indicates removing the _ id field returned by default)

In fact, the above function can also be implemented using the find command at ordinary times, such:

> db.article.find({ author : "dave" }, { _id : 0, author : 1, tags : 1);

$ Unwind

The $ unwind command is amazing. It can split data of an array type field into multiple records, each of which contains an attribute in the array.
For example, you can use the following command to add a record:

db.article.save( {    title : "this is your title" ,    author : "dave" ,    posted : new Date(4121381470000) ,    pageViews : 7 ,    tags : [ "fun" , "nasty" ] ,    comments : [        { author :"barbara" , text : "this is interesting" } ,        { author :"jenny" , text : "i like to play pinball", votes: 10 }    ],    other : { bar : 14 }});

The tags field is an array. Next we apply the $ unwind operation on this field.

db.runCommand({ aggregate : "article", pipeline : [    { $unwind : "$tags" }]});

The command above means splitting by tags field. The execution result of this command is as follows:

{        "result" : [                {                        "_id" : ObjectId("4eeeb5fef09a7c9170df094b"),                        "title" : "this is your title",                        "author" : "dave",                        "posted" : ISODate("2100-08-08T04:11:10Z"),                        "pageViews" : 7,                        "tags" : "fun",                        "comments" : [                                {                                        "author" : "barbara",                                        "text" : "this is interesting"                                },                                {                                        "author" : "jenny",                                        "text" : "i like to play pinball",                                        "votes" : 10                                }                        ],                        "other" : {                                "bar" : 14                        }                },                {                        "_id" : ObjectId("4eeeb5fef09a7c9170df094b"),                        "title" : "this is your title",                        "author" : "dave",                        "posted" : ISODate("2100-08-08T04:11:10Z"),                        "pageViews" : 7,                        "tags" : "nasty",                        "comments" : [                                {                                        "author" : "barbara",                                        "text" : "this is interesting"                                },                                {                                        "author" : "jenny",                                        "text" : "i like to play pinball",                                        "votes" : 10                                }                        ],                        "other" : {                                "bar" : 14                        }                }        ],        "ok" : 1}

We can see that the original tags field is an array containing two elements. After the $ unwind command is run, it is split into two records, the tags field of each record is an element in the original array.

$ Group

The $ group command is easy to understand. The function is to organize multiple data records with the same key value into one record based on a key.
For example, if we use the following command to write a record to the article collection, we will have two records:

db.article.save( {    title : "this is some other title" ,    author : "jane" ,    posted : new Date(978239834000) ,    pageViews : 6 ,    tags : [ "nasty" , "filthy" ] ,    comments : [        { author :"will" , text : "i don't like the color" } ,        { author :"jenny" , text : "can i get that in green?" }    ],    other : { bar : 14 }});

We can split the above $ unwind into multiple records by tags, then reorganize the records by tags field, and put all author corresponding to the same tag in an array. Write as follows:

db.runCommand({ aggregate : "article", pipeline : [    { $unwind : "$tags" },    { $group : {_id : "$tags",        count : { $sum : 1 },authors : { $addToSet : "$author" }    }}]});

Now you can get the following results:

{        "result" : [                {                        "_id" : "filthy",                        "count" : 1,                        "authors" : [                                "jane"                        ]                },                {                        "_id" : "fun",                        "count" : 1,                        "authors" : [                                "dave"                        ]                },                {                        "_id" : "nasty",                        "count" : 2,                        "authors" : [                                "jane",                                "dave"                        ]                }        ],        "ok" : 1}

The above is an introduction to some new statistical commands that will be launched in version 2.1. They provide us a lot of convenience in terms of ease of use, but the biggest weakness of MongoDB MapReduce is that, parallel Execution fails in a single mongod, and it seems that the problem persists. Although the pipeline organization mode is used in the command, it seems that it is complete in serial and segment-down.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to MongoDB's new data statistics framework

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to MongoDB's new data statistics framework

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support