The data is simple, as follows:
> Db.t2.find ();
{"Country": "China", "Province": "Sh", "userid": "A"}
{ "country": "China", "Province": "Sh", "userid": "B"}
{ "country": "China", "Province": "Sh", "userid": "A"}
{ "country": "China", "Province": "Sh", "userid": "C"}
{ "country": "China", "Province": "BJ", "userid": "Da"}
{ "country": "China", "Province": "BJ", "userid": "FA"}
The requirement is to count the number of userid under each country/province (the same userid is counted only once)
The process is as follows.
First try this to count:
> db.t2.aggregate ([{$group: {"_id": {"Country": "$country", "Prov": "$province"}, "number": {$sum: 1}}])
But the result is wrong:
{"_id": {"Country": "China", "Prov": "BJ"}, "Number": 2}
{"_id": {"Country": "China", "Prov": "Sh"}, "Number": 4}
The reason for this is that statistics do not differentiate between the same userid (sh has two userid = A in the above data)
To solve this problem, first execute a group whose ID is country, province, userid three field:
> db.t2.aggregate ([{$group: {"_id": {"Country": "$country", "Province": "$province", "UID": "$userid"}}])
Result is
{"_id": {"Country": "China", "Province": "BJ", "UID": "FA"}}
{"_id": {"Country": "China", "Province": "BJ", "UID": "Da"}}
{"_id": {"Country": "China", "Province": "Sh", "UID": "C"}}
{"_id": {"Country": "China", "Province": "Sh", "UID": "B"}}
{"_id": {"Country": "China", "Province": "Sh", "UID": "A"}}
As you can see, the goal of this step is to have only one of the same userid left.
Then the second step, and then the results of the first step to perform statistics:
>db.t2.aggregate ([
{$group: {"_id": {"Country": "$country", "Province": "$province", "UID": "$userid"}}}, c1/>{$group: {"_id": {"Country": "$_id.country", "Province": "$_id.province" }, Count: {$sum: 1}}
])
That's the right thing to do:
{"_id": {"Country": "China", "Province": "Sh"}, "Count": 3}
{"_id": {"Country": "China", "Province": "BJ"}, "Count": 2}
To make the results look good, add a $project operator to separate the _id:
>db.t2.aggregate ([{$group: {"_id": {"Country": "$country", "Province": "$province", "UID": "$userid"}}},
{$group: {"_id": {"Country": "$_id.country", "Province": "$_id.province" }, Count: {$sum: 1}} },
{$pro Ject: {"_id": 0, "Country": "$_id.country", "Province": "$_id.province", "Count": 1}}
])
{"Count": 3, "Country": "China", "Province": "Sh"}
{"Count": 2, "Country": "China", "Province": "BJ"}