Simulating big Data search scenarios--‐ Xudong He
Please use a script to insert the Map&reduce example from the student table to the 1000w section of this course. Field
remain unchanged.
Use Map&reduce to calculate the number of students per class of 10<age<20.
Please submit the Map&reduce program, which already corresponds to the calculated results.
>
Db.users.count ();
10000000
>
Db.users.find ();
{
"_ID"
:
ObjectId ("55ca9ae785b177a46da9494f"),
"ClassID"
:
1,
"Age"
:
37,
"Name"
:
"NAME0"
}
{
"_ID"
:
ObjectId ("55ca9ae785b177a46da94950"),
"ClassID"
:
1,
"Age"
:
12,
"Name"
:
"Name1"
}
{
"_ID"
:
ObjectId ("55ca9ae785b177a46da94951"),
"ClassID"
:
1,
"Age"
:
31,
"Name"
:
"Name2"
}
{
"_ID"
:
ObjectId ("55ca9ae785b177a46da94952"),
"ClassID"
:
2,
"Age"
:
27,
"Name"
:
"Name3"
}
Script to create simulation data:
for (Var I=1;
i<10000000;i++) {Db.users.save ({Classid:Math.ceil (Math.random () * (2)), age:Math.cei
L (Math.random () * (38-8) +8), Name: "Name" +i})};
The map function map function must call emit (Key,value) to return a key-value pair. Use this to return the current pending
The document that is processed.
> MAPF = function () {Emit (This.classid, 1)}
function () {Emit (This.classid, 1)}
Reduce function
The Reduce function receives a parameter similar to the group effect that has been aggregated once by the health
Combining the sequence of key values returned by map into {key, [Value1,value2,value3,...., Valuen]} is passed to
Reduce,reduce function to values statistics
> reducef=function (Key, values) {
... var count = 0;
... values.foreach (function (v) {count + = V;}); return count;
... }
function (key, values) {
var count = 0;
Values.foreach (function (v) {count + = V;}); return count;
}
Options for more control details
> res = Db.runcommand ({mapreduce: "Users", Map:mapf, REDUCE:REDUCEF,
Out: "Users_res",
FINALIZE:FF,
query:{age:{$lt: 10}}
...
});
{
"Result"
:
"Users_res",
"Timemillis"
:
6251,
"Counts"
:
{
"Input"
:
333716,
"Emit"
:
333716,
"Reduce"
:
6676,
"Output"
:
2
},
"OK"
:
1
}
>
Results:
>
Db.users_res.find ();
{
"_ID"
:
1,
"Value"
:
{
"ClassID"
:
1,
"Count"
:
167142
}
}
{
"_ID"
:
2,
"Value"
:
{
"ClassID"
:
2,
"Count"
:
166574
}
}
1 classes are less than 10 years old and there are 167142 people. 2 classes are less than 10 years old and there are 166574 people.
Continue to count less than 20 years of age, number of students per class:
> res = Db.runcommand ({mapreduce: "Users", Map:mapf, REDUCE:REDUCEF,
Out: "Users_2res", Finalize:ff, query:{age:{$lt: 20}}
... ... });
{
"Result": "Users_2res",
"Timemillis": 23247,
"Counts": {
"Input": 3666243,
"Emit": 3666243,
"Reduce": 73326,
"Output": 2
},
"OK": 1
}
> Db.users_2res.find ();
{"_id": 1, "value": {"ClassID": 1, "Count": 1832306}}
{"_id": 2, "value": {"ClassID": 2, "Count": 1833937}}
1 classes are less than 20 years old and there are 1832306 people. 2 classes are less than 20 years old and there are 1833937 people.
Mongodb-map&reduce