background
MapReduce is a very flexible and powerful data aggregation tool. The advantage of this is that a single aggregation task can be decomposed into several small tasks that are assigned to parallel processing on multiple servers.
MongoDB also provides mapreduce, of course the query language must be JavaScript. The MapReduce in MongoDB mainly has the following stages:
1. Map: Map an operation to each document in the collection
2. Shuffle: Groups The documents according to key, and generates a series of (>=1) value tables (List of values) for each different key.
3. Reduce: Processes the elements in the value table until there is only one element in the value table. The value table is then returned to the shuffle process, looping until each key corresponds to only one value table, and there is only one element in the value table, which is the result of Mr.
4. Finalize: This step is not required. After obtaining the final result of MR, some processing of the data "pruning" is performed.
MongoDB uses the emit function to provide key/value pairs to mapreduce.
The reduce function accepts two parameters: Key,emits. Key is the key in the emit function. Emits is an array whose elements are the value provided by the emit function.
The return result of the reduce function must be reused by either map or reduce, so the return result must be consistent with the element structure in emits.
The This keyword in the map or reduce function, which represents the currently mapping document.
Example
Test data: This collection is data on the prices of products and products purchased by three users.
Codecodefor (var i=0;i<1000;i++) {var rid=math.floor (Math.random () *10), Var priceparsefloat ((Math.random () *10). ToFixed (2)); if (rid<4) {Db.test.insert ({"User": "Joe", "SKU": RID, "Price":p Rice}),} else if (rid>=4 && rid<7) {db. Test.insert ({"User": "Josh", "SKU": RID, "Price":p rice}); } else {Db.test.insert ({"User": "Ken", "SKU": RID, "Price":p rice}); } }
1. How many products are purchased per user? ( single key to do Mr)
Code//sql implementation of Select User,count (SKU) from Testgroup by User//mapreduce implementation Map=function () {Emit (This.user,{count:1})} Reduce=function (key,values) {var cnt=0; Values.foreach (function (val) {cnt+=val.count;}); return {"Count": cnt};} Mr Results saved to collection mr1db.test.mapReduce (map,reduce,{out: "MR1"})//View Mr Results > Db.mr1.find () {"_id": "Joe", "value": {"Count" : 416}} {"_id": "Josh", "value": {"Count": 287}} {"_id": "Ken", "value": {"Count": 297}}
2. How many products are purchased for each user's different product? ( Composite key does Mr)
Code//sql implementation of Select user,sku,count (*) from testgroup by user,sku//mapreduce implementation map= function () { emit ({user:this.user,sku:this.sku},{count:1})}reduce=function (key,values) { Var cnt=0; values.foreach (function (val) { cnt+=val.count;}); return {"Count": cnt};} Db.test.mapReduce (map,reduce,{out: "MR2"}) > db.mr2.find () { "_id" : { "user" &NBSP;: "Joe", "SKU" : 0 }, "value" : { "Count" &NBSP;:&NBSP;103&NBSP; }{ "_id" : { "user" : "Joe", "SKU" : 1 }, "value" : { "Count" : 106 } }{ "_id" : { "user" : " Joe ", " SKU " : 2 }, " value " : { " Count " : 102 } }{ "_id" : { "user" : "Joe", "SKU" : 3 }, "value" &NBSP;: { " Count " : 105 } }{ " _id " : { " user " : " Josh ", " SKU " : 4 }, "Value" : { "Count" : 87 } }{ "_id" &NBSP;: { "User" : "Josh", "SKU" : 5 }, "value" : { " Count " : 107 } }{ " _id " : { " user " : " Josh ", " SKU " : 6 }, "Value" : { "Count" : 93 } }{ "_id" &NBSP;: { "User" : "Ken", "SKU" : 7 }, "value" : { "Count " : 98 } }{ " _id " : { " user " : " Ken ", " SKU "&NBSP;: 8 }, "Value" : { "Count" : 83 } }{ "_id" : { "User" : "Ken", "SKU" : 9 }, "value" : { "Count" : 116 } }
3. What is the total amount of products purchased by each user? ( compound reduce result processing )
Code//sql implements a Select User,count (SKU), SUM (price) from Testgroup by User//mapreduce implements Map=function () {Emit (this.user,{ Amount:this.price,count:1})}reduce=function (key,values) {var res={amount:0,count:0}values.foreach (function (val) { Res.amount+=val.amount; Res.count+=val.count}); return res;} Db.test.mapReduce (map,reduce,{out: "MR3"}) > Db.mr3.find () {"_id": "Joe", "value": {"Amount": 2053.8899999999994, "C Ount ": 395}} {" _id ":" Josh "," value ": {" Amount ": 1409.2600000000002," Count ": 292}} {" _id ":" Ken "," value ": { "Amount": 1547.7700000000002, "Count": 313}}
4. The float accuracy of the amount returned in 3 needs to be changed to two decimal places, and the average price of the commodity is required. ( use finalize to process the reduce result set )
Code//sql implementation Select user,cast (SUM (price) as decimal (10, 2)) as amount,count (SKU) as [count],cast ((SUM (price)/count (SKU)) as decimal (10,2)) as avgpricefrom testgroup by user//mapreduce implements map=function () { emit ( this.user,{amount:this.price,count:1,avgprice:0})}reduce=function (key,values) { var res={amount : 0,count:0,avgprice:0}values.foreach (Function (val) { res.amount+=val.amount; res.count+=val.count}); return res;} finalizefun=function (Key,reduceresult) { reduceresult.amount= (Reduceresult.amount). toFixed (2); Reduceresult.avgprice= (Reduceresult.amount/reduceresult.count). toFixed (2); return reduceresult;} db.test.mapreduce (map,reduce,{out: "Mr4", Finalize:finalizefun}) > db.mr4.find () { "_id" &NBSP;: "Joe", "value" : { "Amount" : "2053.89", "Count" : 395, "Avgprice" : "5.20" } }{ "_id" : "Josh", "value" : { "Amount" : "1409.26", "Count" : 292, "Avgprice" : "4.83" } }{ "_id" : "Ken", "value" : { "Amount" : "1547.77", "Count" : 313, "Avgprice" : "4.94" &NBSP;} }
5. Count SKUs with a unit price greater than 6, per user's purchase quantity. ( filter data subset to do Mr)
This is relatively simple, just need to call the 1 in the Mr When the filter query can be, the other unchanged.
Codedb.test.mapReduce (map,reduce,{query:{price:{"$GT": 6}},out: "MR5"})
Summary
The Mr Tool in MongoDB is very powerful, and the example in this article is just a basic instance. After combining sharding, multiple servers do data collection processing in parallel, in order to truly reveal their capabilities.
If there is time to follow, hopefully can summarize and share more about MongoDB, something about SQL Server.
This article is from "Joe TJ" blog, be sure to keep this source http://joetang.blog.51cto.com/2296191/1610373
Mongodb:mapreduce Foundation and examples