Let's say we have a MongoDB collection,
take this simple set as an example, we need to include how many different mobile phone numbers in the collection, the first thought is to use the DISTINCT keyword,
db.tokencaller.distinct (' Caller '). Length
If you want to see specific and different phone numbers, then you can omit the length property, since db.tokencaller.distinct (' Caller ') returns an array of all the mobile phone numbers.
but is this a way of satisfying all things? Not so, if you want to count the number of collection records, such as tens, then in such a statistical time will often reported 10044 error, the message "exception:distinct too big, 16mb cap". Later we will resolve it in other ways.
another way to use RunCommand combined with distinct,
Db.runcommand ({"distinct": "Tokencaller", "Key": "Caller"})
visible on the values of the mobile phone number after the deduplication, see the result is a JSON format, and then try to see if you can remove the values of the size, because if the large amount of data for the collection, the direct display of the weight of the number is obviously inappropriate, and then tried the following wording:
discovery is possible, so the big data use this way to see if you can take out the results, found that there is no length attribute, think about the client version of MongoDB should be related to it, but also to verify!!!
two ways are not, and then tried the next MapReduce way, specifically as follows:
then we will find that he will output the results of the query to a combination called "Callerstatis", as follows:
then use DB. Callerstatis.count () can tell how many different mobile numbers there are.
in this way, we also try on a collection of big data,but it was a failure.!!! (Sancent t_t), if anyone has a good way, trouble also tell me, small grateful ah ^_^
How to count the data after deduplication in the MongoDB collection