MongoDB Map Reduce

Source: Internet
Author: User
Tags emit

Map-reduce is a computational model, which simply means that a large amount of work (data) decomposition (MAP) is performed, and then the results are combined into the final result (REDUCE).

MongoDB offers a very flexible map-reduce, which is also quite useful for large-scale data analysis.

The following is the basic syntax for MapReduce:

>Db.Collection.Mapreduce
( function() {Emit(Key,Value);}, Map function function (key,values< Span class= "pun" >) {return reducefunction }, //reduce functions { out: Collection Query: Document, sort< Span class= "pun" >: Document, Limit: number })

Using MapReduce to implement the two function map functions and the reduce function, the map function calls emit (key, value), traverses all the records in the collection, and passes the key and value to the reduce function for processing.

The MAP function must call emit (key, value) to return a key-value pair.

Parameter description:

    • Map: Map functions (Generate key-value pairs of sequences, as parameters of the reduce function).
    • The reduce statistic function, the task of the reduce function is to turn key-values into Key-value, that is, to turn the values array into a single value.
    • The out statistic result holds the collection (does not specify that the temporary collection is used and is automatically deleted after the client disconnects).
    • Query a filter condition in which only documents that meet the criteria call the map function. (Query. Limit,sort can be combined freely)
    • Sort and limit combine sort parameters (also sort documents before they are sent to the map function) to optimize the grouping mechanism
    • Limit the number of documents that are sent to the map function (if no limit is used, using sort alone is not very useful)
 Insert test data: 

For I in xrange (1000):
Rid=math.floor (Random.random () *10);
Price = Round (Random.random () *10,2);
If RID < 4:
Db.test.insert ({"_id": I, "user": "Joe", "Product": RID, "Price":p rice});
Elif rid>=4 and Rid<7:
Db.test.insert ({"_id": I, "user": "Josh", "Product": RID, "Price":p rice});
Else
Db.test.insert ({"_id": I, "user": "Ken", "Product": RID, "Price":p rice});

The result data is:

{"_id": 0, "price": 5.9, "Product": 9, "user": "Ken"}
{"_id": 1, "price": 7.59, "Product": 7, "User": "Ken"}
{"_id": 2, "price": 4.72, "Product": 0, "user": "Joe"}
{"_id": 3, "price": 1.35, "product": 1, "User": "Joe"}
{"_id": 4, "price": 2.31, "Product": 0, "user": "Joe"}
{"_id": 5, "price": 5.29, "Product": 5, "User": "Josh"}
{"_id": 6, "price": 3.34, "Product": 1, "User": "Joe"}
{"_id": 7, "price": 7.2, "Product": 4, "User": "Josh"}
{"_id": 8, "price": 8.1, "Product": 6, "user": "Josh"}
{"_id": 9, "price": 2.57, "Product": 3, "user": "Joe"}
{"_id": Ten, "Price": 0.54, "Product": 2, "User": "Joe"}
{"_id": One, "price": 0.66, "Product": 1, "User": "Joe"}
{"_id": "Price": 5.51, "Product": 1, "User": "Joe"}
{"_id": "Price": 3.74, "Product": 6, "user": "Josh"}
{"_id": +, "price": 4.82, "Product": 0, "user": "Joe"}
{"_id": "Price": 9.79, "Product": 3, "user": "Joe"}
{"_id": +, "price": 9.6, "Product": 5, "User": "Josh"}
{"_id": +, "price": 4.06, "Product": 7, "User": "Ken"}
{"_id": "Price": 1.37, "Product": 5, "User": "Josh"}
{"_id": +, "price": 6.77, "Product": 9, "user": "Ken"}

   Test 1, how many products each user has purchased? The  
is implemented with an SQL statement as: SELECT User,count (product) from test group by user

MapReduce implementation

Map=function () {
Emit (This.user,{count:1})
}

reduce = function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++)
{
Total + = Values[i].count;
}
return {count:total};
}

result = Db.test.mapReduce (map,reduce,{out: ' re '})
Execution Result:
  

  

  Query out results:
  

  

2. How many products are purchased for each user? (Compound key to do re)

SQL implementation: Select User,product,count (*) from test GROUP by User,product

MapReduce implementations:

  

Map = function () {
Emit ({user:this.user,product:this.product},{count:1})
}

reduce = function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++)
{
Total + = Values[i].count;
}
return {count:total};
}

Execution: result = Db.test.mapReduce (map,reduce,{out: ' Re2 '})

Query result Re2:

  

3. What is the total amount of products purchased by each user? ( compound reduce result processing )

The SQL implementation is: SELECT User,count (Product), SUM (price) from the test group by user

MapReduce implementations:

  

Map=function () {
Emit (This.user,{amount:this.price,count:1})
}

  

reduce = function (key, values) {
var res={amount:0,count:0};
for (var i = 0; i < values.length; i++)
{
Res.count + = Values[i].count;
Res.amount + = Values[i].amount;
}
Res.count = Math.Round (res.count,2);
Res.amount = Math.Round (res.amount,2);
return res;
}

Execution: Db.test.mapReduce (map,reduce,{out: "Re3"})

  

Query Re3:

  

4. The float accuracy of the amount returned in 3 needs to be changed to two decimal places, and the average price of the commodity is also required. ( use finalize to process the reduce result set )

SQL implementation: Select User,count (SKU), SUM (price), round (sum (price)/count (SKUs), 2) as Avgprice from test group by user

MapReduce implementations:

  

MongoDB Map Reduce

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.