Map-reduce is a computational model, which simply means that a large amount of work (data) decomposition (MAP) is performed, and then the results are combined into the final result (REDUCE).
MongoDB offers a very flexible map-reduce, which is also quite useful for large-scale data analysis.
The following is the basic syntax for MapReduce:
>Db.Collection.Mapreduce
( function() {Emit(Key,Value);}, Map function function (key,values< Span class= "pun" >) {return reducefunction }, //reduce functions { out: Collection Query: Document, sort< Span class= "pun" >: Document, Limit: number })
Using MapReduce to implement the two function map functions and the reduce function, the map function calls emit (key, value), traverses all the records in the collection, and passes the key and value to the reduce function for processing.
The MAP function must call emit (key, value) to return a key-value pair.
Parameter description:
- Map: Map functions (Generate key-value pairs of sequences, as parameters of the reduce function).
- The reduce statistic function, the task of the reduce function is to turn key-values into Key-value, that is, to turn the values array into a single value.
- The out statistic result holds the collection (does not specify that the temporary collection is used and is automatically deleted after the client disconnects).
- Query a filter condition in which only documents that meet the criteria call the map function. (Query. Limit,sort can be combined freely)
- Sort and limit combine sort parameters (also sort documents before they are sent to the map function) to optimize the grouping mechanism
- Limit the number of documents that are sent to the map function (if no limit is used, using sort alone is not very useful)
Insert test data:
For I in xrange (1000):
Rid=math.floor (Random.random () *10);
Price = Round (Random.random () *10,2);
If RID < 4:
Db.test.insert ({"_id": I, "user": "Joe", "Product": RID, "Price":p rice});
Elif rid>=4 and Rid<7:
Db.test.insert ({"_id": I, "user": "Josh", "Product": RID, "Price":p rice});
Else
Db.test.insert ({"_id": I, "user": "Ken", "Product": RID, "Price":p rice});
The result data is:
{"_id": 0, "price": 5.9, "Product": 9, "user": "Ken"}
{"_id": 1, "price": 7.59, "Product": 7, "User": "Ken"}
{"_id": 2, "price": 4.72, "Product": 0, "user": "Joe"}
{"_id": 3, "price": 1.35, "product": 1, "User": "Joe"}
{"_id": 4, "price": 2.31, "Product": 0, "user": "Joe"}
{"_id": 5, "price": 5.29, "Product": 5, "User": "Josh"}
{"_id": 6, "price": 3.34, "Product": 1, "User": "Joe"}
{"_id": 7, "price": 7.2, "Product": 4, "User": "Josh"}
{"_id": 8, "price": 8.1, "Product": 6, "user": "Josh"}
{"_id": 9, "price": 2.57, "Product": 3, "user": "Joe"}
{"_id": Ten, "Price": 0.54, "Product": 2, "User": "Joe"}
{"_id": One, "price": 0.66, "Product": 1, "User": "Joe"}
{"_id": "Price": 5.51, "Product": 1, "User": "Joe"}
{"_id": "Price": 3.74, "Product": 6, "user": "Josh"}
{"_id": +, "price": 4.82, "Product": 0, "user": "Joe"}
{"_id": "Price": 9.79, "Product": 3, "user": "Joe"}
{"_id": +, "price": 9.6, "Product": 5, "User": "Josh"}
{"_id": +, "price": 4.06, "Product": 7, "User": "Ken"}
{"_id": "Price": 1.37, "Product": 5, "User": "Josh"}
{"_id": +, "price": 6.77, "Product": 9, "user": "Ken"}
Test 1, how many products each user has purchased? The
is implemented with an SQL statement as: SELECT User,count (product) from test group by user
MapReduce implementation
Map=function () {
Emit (This.user,{count:1})
}
reduce = function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++)
{
Total + = Values[i].count;
}
return {count:total};
}
result = Db.test.mapReduce (map,reduce,{out: ' re '})
Execution Result:
Query out results:
2. How many products are purchased for each user? (Compound key to do re)
SQL implementation: Select User,product,count (*) from test GROUP by User,product
MapReduce implementations:
Map = function () {
Emit ({user:this.user,product:this.product},{count:1})
}
reduce = function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++)
{
Total + = Values[i].count;
}
return {count:total};
}
Execution: result = Db.test.mapReduce (map,reduce,{out: ' Re2 '})
Query result Re2:
3. What is the total amount of products purchased by each user? ( compound reduce result processing )
The SQL implementation is: SELECT User,count (Product), SUM (price) from the test group by user
MapReduce implementations:
Map=function () {
Emit (This.user,{amount:this.price,count:1})
}
reduce = function (key, values) {
var res={amount:0,count:0};
for (var i = 0; i < values.length; i++)
{
Res.count + = Values[i].count;
Res.amount + = Values[i].amount;
}
Res.count = Math.Round (res.count,2);
Res.amount = Math.Round (res.amount,2);
return res;
}
Execution: Db.test.mapReduce (map,reduce,{out: "Re3"})
Query Re3:
4. The float accuracy of the amount returned in 3 needs to be changed to two decimal places, and the average price of the commodity is also required. ( use finalize to process the reduce result set )
SQL implementation: Select User,count (SKU), SUM (price), round (sum (price)/count (SKUs), 2) as Avgprice from test group by user
MapReduce implementations:
MongoDB Map Reduce