When using group in MongoDB, the Group field is not a unique index record number greater than 2000

Source: Internet
Author: User
Tags emit mongodb

In a single statistic, I used the group in MongoDB to summarize a record number 100W table.

The result is an unexpected message.

Error in executing GroupBy
Command ' group ' failed:exception:group () can ' t handle more than 20000 unique keys (response: {"errmsg": "Exception:gr OUP () can ' t handle more than 20000 unique keys "," code ": 10043," OK ": 0.0})
Type:MongoDB.Driver.MongoCommandException


From the exception information can be seen in MongoDB Group is limited, non-unique index record number can not be greater than 20000.

But I did not study, how to set the MongoDB parameter, to remove this restriction.

But you can use the MapReduce in MongoDB, still can complete the statistical requirements.

MapReduce Primary use can refer to: http://www.kafka0102.com/2010/09/329.html


Here's a simple way to say what you understand mongdb mapreduce principle.

Map is the mapping, reduce simplification.

It means that I need to collect information based on the rules that you define (perform a map operation) at the time of the statistics.

Then extract the data you want from the information you collect (reduce).

Check the syntax first:

Introduction to Grammar

MapReduce is a command in MongoDB, which has the following syntax format:

Db.runcommand (
 {mapreduce: <collection>,
   map: <mapfunction>,
   reduce: <reducefunction >
   [, Query: <query filter object>]
   [, Sort: <sort the query.  Useful for Optimization>]
   [, limit: <number of objects to return from Collection>]
   [, Out: <output-c Ollection Name>]
   [, keeptemp: <true|false>]
   [, Finalize: <finalizefunction>]
   [, Scope: <object where fields go into JavaScript global scope;]
   [, Verbose:true]
 }
);



For this command, there are 3 parameters that I will not explain. For optional parameters, here is a brief description:
(1) query is a very common use, it is used in the map phase to filter the query conditions to limit the scope of the mapreduce operation of the record.
(2) and query related to sort and limit, I initially thought that they are used in the reduce phase, and actually with the query in the map phase.
(3) MongoDB default is to create a temporary collection storage mapreduce results, when the client connection is closed or displayed using Collection.drop (), this temporary collection will be deleted. This also means that the default keeptemp is False, and if Keeptemp is true, then the result collection is permanent. Of course, the generated collection name is not friendly, so you can specify the name of the collection that out indicates persistent storage (you do not need to specify keeptemp at this time). When out is specified, the execution result is not stored directly to the out, but also to the temporary collection, and then if the out exists then drop, and the last rename temporary collection is out.
(4) Finalize: Applied to all results when MapReduce is complete, usually not used.
(5) Verbose: Provides statistical information on execution time.

The first step: in the map function we usually use the emit function.

Emit
This.city,//How to Group
{count:1, age:this.age}//associated data point (document)
);

The emit function has two parameters.

Parameter 1, which indicates the fields you want to group.

Parameter 2, the fields required in each data in the grouping.

When map execution is complete, we can imagine that the collected data is stored in a map collection, where the Group field is the Key,value value is the number of data in the group.

As an example:

There is a table:

class, Student

1,a

1,b

2,c

2,d

Then map stores the class capacity.

MAP1={1:A,1:B},MAP2={2:C,2:D} (this is the value)

MAP={1:MAP1,2:MAP2};

Step two: Then do the reduce again.

The reduce function is called once for each item in the map.

Specific functions:

function Reduce (key, values) {
/*
var reduced = {count:0, age:0}; Initialize a doc (same format as emitted value)
Values.foreach (function (val) {
Reduced.age + = Val.age; Reduce logic
Reduced.count + = Val.count;
});
return reduced;
*/
return values[0];
}


The third step: is an optional option. Here we mainly introduce the following finalize

   [, Query: <query filter object>]
   [, Sort: <sort the query.  Useful for Optimization>]
   [, limit: <number of objects to return from Collection>]
   [, Out: <output-c Ollection Name>]
   [, keeptemp: <true|false>]
   [, Finalize: <finalizefunction>]
   [, Scope: <object where fields go into JavaScript global scope;]
   [, Verbose:true]



Finalize is the final meaning that once the data of MapReduce is processed again, it is equivalent to the having operation after group by in the relational database.

For example, we need to filter count numbers greater than 10 records. or averaging wait.

function Finalize (key, reduced) {
/*

Make final updates or calculations
Reduced.avgage = Reduced.age/reduced.count;

*/
if (reduced.count>10) {return;}//filter record number greater than 10
return reduced;
}



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.