Mongodb:mapreduce Foundation and examples

Source: Internet
Author: User
Tags emit shuffle

background

MapReduce is a very flexible and powerful data aggregation tool. The advantage of this is that a single aggregation task can be decomposed into several small tasks that are assigned to parallel processing on multiple servers.

MongoDB also provides mapreduce, of course the query language must be JavaScript. The MapReduce in MongoDB mainly has the following stages:

1. Map: Map an operation to each document in the collection

2. Shuffle: Groups The documents according to key, and generates a series of (>=1) value tables (List of values) for each different key.

3. Reduce: Processes the elements in the value table until there is only one element in the value table. The value table is then returned to the shuffle process, looping until each key corresponds to only one value table, and there is only one element in the value table, which is the result of Mr.

4. Finalize: This step is not required. After obtaining the final result of MR, some processing of the data "pruning" is performed.

MongoDB uses the emit function to provide key/value pairs to mapreduce.

The reduce function accepts two parameters: Key,emits. Key is the key in the emit function. Emits is an array whose elements are the value provided by the emit function.

The return result of the reduce function must be reused by either map or reduce, so the return result must be consistent with the element structure in emits.

The This keyword in the map or reduce function, which represents the currently mapping document.

Example

Test data: This collection is data on the prices of products and products purchased by three users.

CodeCode for(vari=0;i<1000;i++) {varRid=Math. Floor (Math. Random () *10);varPriceparsefloat ((Math.random () *10). ToFixed (2));if(rid<4) {Db.test.insert ({"User":"Joe","SKU": RID," Price":p Rice}");Else if(rid>=4 && rid<7) {Db.test.insert ({"User":"Josh","SKU": RID," Price":p Rice}");Else{Db.test.insert ({"User":"Ken","SKU": RID," Price":p Rice}"); } }

1. How many products are purchased per user? ( single key to do Mr)

Code//sql ImplementationSelect User,count (SKU) from Testgroup by user//mapreduce Implementationmap=function() {Emit ( This. user,{count:1})}reduce=function(key,values) {varcnt=0; Values.foreach (function(Val) {cnt+=val.count;});return{"Count": cnt};}//MR results to collection MR1Db.test.mapReduce (map,reduce,{out: "MR1"})//After viewing Mr Results> DB.MR1.Find(){ "_id" : "Joe", "value" : { "Count": 416}} {"_id" : "Josh", "value" : { "Count": 287}} {"_id" : "Ken", "value" : { "Count": 297}}

2. How many products are purchased for each user's different product? ( Composite key does Mr)

Code//sql ImplementationSelect User,sku,count (*) from Testgroup by User,sku//mapreduce Implementationmap=function() {Emit ({user:this.user,sku:this.sku},{count:1})}reduce=function(key,values) {varcnt=0; Values.foreach (function(Val) {cnt+=val.count;});return{"Count": cnt};} Db.test.mapReduce (map,reduce,{out: "MR2"}) > DB.MR2.Find(){ "_id" : { "User" : "Joe", "SKU": 0},"value" : { "Count": 103}} {"_id" : { "User" : "Joe", "SKU": 1},"value" : { "Count": 106}} {"_id" : { "User" : "Joe", "SKU": 2},"value" : { "Count": 102}} {"_id" : { "User" : "Joe", "SKU": 3},"value" : { "Count": 105}} {"_id" : { "User" : "Josh", "SKU": 4},"value" : { "Count": 87}} {"_id" : { "User" : "Josh", "SKU": 5},"value" : { "Count": 107}} {"_id" : { "User" : "Josh", "SKU": 6},"value" : { "Count": 93}} {"_id" : { "User" : "Ken", "SKU": 7},"value" : { "Count": 98}} {"_id" : { "User" : "Ken", "SKU": 8},"value" : { "Count": 83}} {"_id" : { "User" : "Ken", "SKU": 9},"value" : { "Count": 116}}

3. What is the total amount of products purchased by each user? ( compound reduce result processing )

Code//sql ImplementationSelect User,count (SKU), SUM (price) from Testgroup by user//mapreduce Implementationmap=function() {Emit ( This. User,{amount: This. price,count:1})}reduce=function(key,values) {varRes={amount:0,count:0}values.foreach (function(Val)    {res.amount+=val.amount; Res.count+=val.count});returnRes;} Db.test.mapReduce (map,reduce,{out: "MR3"}) > Db.mr3.Find(){ "_id" : "Joe", "value" : { "Amount": 2053.8899999999994,"Count": 395}} {"_id" : "Josh", "value" : { "Amount": 1409.2600000000002,"Count": 292}} {"_id" : "Ken", "value" : { "Amount": 1547.7700000000002,"Count": 313}}

4. The float accuracy of the amount returned in 3 needs to be changed to two decimal places, and the average price of the commodity is required. ( use finalize to process the reduce result set )

Code//sql ImplementationSelect User,cast (SUM (price) as decimal (2)) as Amount,count (SKUs) as [Count],cast ((SUM (price)/count (SKU)) as Decima L (10,2)) as Avgpricefrom testgroup by user//mapreduce Implementationmap=function() {Emit ( This. User,{amount: This. price,count:1,avgprice:0})}reduce=function(key,values) {varRes={amount:0,count:0,avgprice:0}values.foreach (function(Val)    {res.amount+=val.amount; Res.count+=val.count});returnRes;} finalizefun=function(Key,reduceresult) {reduceresult.amount= (Reduceresult.amount). toFixed (2); Reduceresult.avgprice= (reduceresult.amount/ Reduceresult.count). toFixed (2);returnReduceresult;} Db.test.mapReduce (map,reduce,{out: "MR4", Finalize:finalizefun}) > Db.mr4.Find(){ "_id" : "Joe", "value" : { "Amount" : "2053.89", "Count": 395,"Avgprice" : "5.20" } }{ "_id" : "Josh", "value" : { "Amount" : "1409.26", "Count": 292,"Avgprice" : "4.83" } }{ "_id" : "Ken", "value" : { "Amount" : "1547.77", "Count": 313,"Avgprice" : "4.94" } }

5. Count SKUs with a unit price greater than 6, per user's purchase quantity. ( filter data subset to do Mr)

This is relatively simple, just need to call the 1 in the Mr When the filter query can be, the other unchanged.

Code db.test.mapReduce (map,reduce,{query:{price:{"$gt": 6}},out: "mr5"})

Summary

The Mr Tool in MongoDB is very powerful, and the example in this article is just a basic instance. After combining sharding, multiple servers do data collection processing in parallel, in order to truly reveal their capabilities.

If there is time to follow, hopefully can summarize and share more about MongoDB, something about SQL Server.

Mongodb:mapreduce Basics and examples

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.