Introduction to MongoDB's new data statistics framework

Source: Internet
Author: User

CurrentMongoDBComplex dataStatisticsMapReduce is required for computing, including group by queries that are commonly used in SQL. It is troublesome to write a reduce statement. In MongoDB2.1, a new data statistics computing framework will be introduced, making it easier for users to perform statistics operations.

Next we will look at several new operators:

$ Match

$ Match is used to filter data by setting a condition. For example:

db.runCommand({ aggregate : "article", pipeline : [    { $match : { author : "dave" } }]});

This is equivalent to filtering records in the collection of articles. The filtering condition is that the author attribute value is dave, which is equivalent to the common find command, for example:

> db.article.find({ author : "dave" });

So, what is the purpose of this command? Unlike find, the result of find is directly returned as the final data, while $ match is only a part of pipeline. It filters the result data and can perform the next-level statistical operation.

$ Project

The $ project command is used to set data filtering fields, just like the fields required by select in SQL. Example:

db.runCommand({ aggregate : "article", pipeline : [    { $match : { author : "dave" } },    { $project : {        _id : 0,author : 1,        tags : 1    }}]});

In the preceding example, the author and tags fields of all the records whose author is dave are obtained. (_ Id: 0 indicates removing the _ id field returned by default)

In fact, the above function can also be implemented using the find command at ordinary times, such:

> db.article.find({ author : "dave" }, { _id : 0, author : 1, tags : 1);
$ Unwind

The $ unwind command is amazing. It can split data of an array type field into multiple records, each of which contains an attribute in the array.
For example, you can use the following command to add a record:

db.article.save( {    title : "this is your title" ,    author : "dave" ,    posted : new Date(4121381470000) ,    pageViews : 7 ,    tags : [ "fun" , "nasty" ] ,    comments : [        { author :"barbara" , text : "this is interesting" } ,        { author :"jenny" , text : "i like to play pinball", votes: 10 }    ],    other : { bar : 14 }});

The tags field is an array. Next we apply the $ unwind operation on this field.

db.runCommand({ aggregate : "article", pipeline : [    { $unwind : "$tags" }]});

The command above means splitting by tags field. The execution result of this command is as follows:

{        "result" : [                {                        "_id" : ObjectId("4eeeb5fef09a7c9170df094b"),                        "title" : "this is your title",                        "author" : "dave",                        "posted" : ISODate("2100-08-08T04:11:10Z"),                        "pageViews" : 7,                        "tags" : "fun",                        "comments" : [                                {                                        "author" : "barbara",                                        "text" : "this is interesting"                                },                                {                                        "author" : "jenny",                                        "text" : "i like to play pinball",                                        "votes" : 10                                }                        ],                        "other" : {                                "bar" : 14                        }                },                {                        "_id" : ObjectId("4eeeb5fef09a7c9170df094b"),                        "title" : "this is your title",                        "author" : "dave",                        "posted" : ISODate("2100-08-08T04:11:10Z"),                        "pageViews" : 7,                        "tags" : "nasty",                        "comments" : [                                {                                        "author" : "barbara",                                        "text" : "this is interesting"                                },                                {                                        "author" : "jenny",                                        "text" : "i like to play pinball",                                        "votes" : 10                                }                        ],                        "other" : {                                "bar" : 14                        }                }        ],        "ok" : 1}

We can see that the original tags field is an array containing two elements. After the $ unwind command is run, it is split into two records, the tags field of each record is an element in the original array.

$ Group

The $ group command is easy to understand. The function is to organize multiple data records with the same key value into one record based on a key.
For example, if we use the following command to write a record to the article collection, we will have two records:

db.article.save( {    title : "this is some other title" ,    author : "jane" ,    posted : new Date(978239834000) ,    pageViews : 6 ,    tags : [ "nasty" , "filthy" ] ,    comments : [        { author :"will" , text : "i don't like the color" } ,        { author :"jenny" , text : "can i get that in green?" }    ],    other : { bar : 14 }});

We can split the above $ unwind into multiple records by tags, then reorganize the records by tags field, and put all author corresponding to the same tag in an array. Write as follows:

db.runCommand({ aggregate : "article", pipeline : [    { $unwind : "$tags" },    { $group : {_id : "$tags",        count : { $sum : 1 },authors : { $addToSet : "$author" }    }}]});

Now you can get the following results:

{        "result" : [                {                        "_id" : "filthy",                        "count" : 1,                        "authors" : [                                "jane"                        ]                },                {                        "_id" : "fun",                        "count" : 1,                        "authors" : [                                "dave"                        ]                },                {                        "_id" : "nasty",                        "count" : 2,                        "authors" : [                                "jane",                                "dave"                        ]                }        ],        "ok" : 1}

The above is an introduction to some new statistical commands that will be launched in version 2.1. They provide us a lot of convenience in terms of ease of use, but the biggest weakness of MongoDB MapReduce is that, parallel Execution fails in a single mongod, and it seems that the problem persists. Although the pipeline organization mode is used in the command, it seems that it is complete in serial and segment-down.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.