In a single statistic, I used the group in MongoDB to summarize a record number 100W table.
The result is an unexpected message.
Error in executing GroupBy
Command ' group ' failed:exception:group () can ' t handle more than 20000 unique keys (response: {"errmsg": "Exception:gr OUP () can ' t handle more than 20000 unique keys "," code ": 10043," OK ": 0.0})
Type:MongoDB.Driver.MongoCommandException
From the exception information can be seen in MongoDB Group is limited, non-unique index record number can not be greater than 20000.
But I did not study, how to set the MongoDB parameter, to remove this restriction.
But you can use the MapReduce in MongoDB, still can complete the statistical requirements.
MapReduce Primary use can refer to: http://www.kafka0102.com/2010/09/329.html
Here's a simple way to say what you understand mongdb mapreduce principle.
Map is the mapping, reduce simplification.
It means that I need to collect information based on the rules that you define (perform a map operation) at the time of the statistics.
Then extract the data you want from the information you collect (reduce).
Check the syntax first:
Introduction to Grammar
MapReduce is a command in MongoDB, which has the following syntax format:
Db.runcommand (
{mapreduce: <collection>,
map: <mapfunction>,
reduce: <reducefunction >
[, Query: <query filter object>]
[, Sort: <sort the query. Useful for Optimization>]
[, limit: <number of objects to return from Collection>]
[, Out: <output-c Ollection Name>]
[, keeptemp: <true|false>]
[, Finalize: <finalizefunction>]
[, Scope: <object where fields go into JavaScript global scope;]
[, Verbose:true]
}
);
For this command, there are 3 parameters that I will not explain. For optional parameters, here is a brief description:
(1) query is a very common use, it is used in the map phase to filter the query conditions to limit the scope of the mapreduce operation of the record.
(2) and query related to sort and limit, I initially thought that they are used in the reduce phase, and actually with the query in the map phase.
(3) MongoDB default is to create a temporary collection storage mapreduce results, when the client connection is closed or displayed using Collection.drop (), this temporary collection will be deleted. This also means that the default keeptemp is False, and if Keeptemp is true, then the result collection is permanent. Of course, the generated collection name is not friendly, so you can specify the name of the collection that out indicates persistent storage (you do not need to specify keeptemp at this time). When out is specified, the execution result is not stored directly to the out, but also to the temporary collection, and then if the out exists then drop, and the last rename temporary collection is out.
(4) Finalize: Applied to all results when MapReduce is complete, usually not used.
(5) Verbose: Provides statistical information on execution time.
The first step: in the map function we usually use the emit function.
Emit
This.city,//How to Group
{count:1, age:this.age}//associated data point (document)
);
The emit function has two parameters.
Parameter 1, which indicates the fields you want to group.
Parameter 2, the fields required in each data in the grouping.
When map execution is complete, we can imagine that the collected data is stored in a map collection, where the Group field is the Key,value value is the number of data in the group.
As an example:
There is a table:
class, Student
1,a
1,b
2,c
2,d
Then map stores the class capacity.
MAP1={1:A,1:B},MAP2={2:C,2:D} (this is the value)
MAP={1:MAP1,2:MAP2};
Step two: Then do the reduce again.
The reduce function is called once for each item in the map.
Specific functions:
function Reduce (key, values) {
/*
var reduced = {count:0, age:0}; Initialize a doc (same format as emitted value)
Values.foreach (function (val) {
Reduced.age + = Val.age; Reduce logic
Reduced.count + = Val.count;
});
return reduced;
*/
return values[0];
}
The third step: is an optional option. Here we mainly introduce the following finalize
[, Query: <query filter object>]
[, Sort: <sort the query. Useful for Optimization>]
[, limit: <number of objects to return from Collection>]
[, Out: <output-c Ollection Name>]
[, keeptemp: <true|false>]
[, Finalize: <finalizefunction>]
[, Scope: <object where fields go into JavaScript global scope;]
[, Verbose:true]
Finalize is the final meaning that once the data of MapReduce is processed again, it is equivalent to the having operation after group by in the relational database.
For example, we need to filter count numbers greater than 10 records. or averaging wait.
function Finalize (key, reduced) {
/*
Make final updates or calculations
Reduced.avgage = Reduced.age/reduced.count;
*/
if (reduced.count>10) {return;}//filter record number greater than 10
return reduced;
}