Test mongodb mapreduce

Source: Internet
Author: User
Tags emit

Recently, due to the business needs of the product, you need to use a relatively large amount of data for computing. By the way, try the mapreduce function of mongodb, and it feels pretty good.

 

The following is an official example:

$ ./mongo> db.things.insert( { _id : 1, tags : ['dog', 'cat'] } );> db.things.insert( { _id : 2, tags : ['cat'] } );> db.things.insert( { _id : 3, tags : ['mouse', 'cat', 'dog'] } );> db.things.insert( { _id : 4, tags : []  } );> // map function> m = function(){...    this.tags.forEach(...        function(z){...            emit( z , { count : 1 } );...        }...    );...};> // reduce function> r = function( key , values ){...    var total = 0;...    for ( var i=0; i<values.length; i++ )...        total += values[i].count;...    return { count : total };...};> res = db.things.mapReduce(m,r);> res{"timeMillis.emit" : 9 , "result" : "mr.things.1254430454.3" , "numObjects" : 4 , "timeMillis" : 9 , "errmsg" : "" , "ok" : 0}> db[res.result].find(){"_id" : "cat" , "value" : {"count" : 3}}{"_id" : "dog" , "value" : {"count" : 2}}{"_id" : "mouse" , "value" : {"count" : 1}} > db[res.result].drop()

Mapreduce parameter description

db.runCommand({     mapreduce : <collection>,      map : <mapfunction>,        reduce : <reducefunction>      [, query : <query filter object>]        [, sort : <sort the query.  useful for optimization>]        [, limit : <number of objects to return from collection>]        [, out : <output-collection name>]        [, keeptemp: <true|false>]        [, finalize : <finalizefunction>]        [, scope : <object where fields go into javascript global scope >]        [, verbose : true]  });

Mapreduce: Specifies the collection for mapreduce processing.
Map: map function
Reduce: reduce Function
Query: A filtering condition. Only rows that meet the conditions are added to the mapreduce set. This filtering process is executed prior to the entire mapreduce process.
Sort: the sort sorting parameter combined with query. This is the only option to optimize the grouping mechanism.
Limit: Same as above
Out: name of the collection output. If this parameter is not specified, a collection with a random name will be created by default.
Keytemp: true or false, indicating whether the result output to the collection is temporary. If it is true, it is automatically deleted after the client connection is interrupted, if you are using a MongoDB mongo client connection, it will be deleted only after exit. If the script is executed, exit the script or call close to automatically delete the result collection.
Finalize: similar to map and reduce, it is a function. It can calculate the key and value and return a final result after reduce returns a result.
Scope: Set the parameter value. The value set here is visible in the map, reduce, and finalize functions.
Verbose: prints debugging information during execution.

Return format:

{ result : <collection_name>,   counts : {input :  <number of objects scanned>, emit  : <number of times emit was called>, output : <number of items in output collection>} ,timeMillis : <job_time>,ok : <1_if_ok>,[, err : <errmsg_if_error>] }

 

The following is a slightly more complex example. The following is an example of how to calculate the exposure of a house on the housing list page:

Mongodb data format:

{ "_id" : ObjectId("50364d9fdec7d5ce4000198d"), "pn" : "Listing_V2_IndexPage_All", "guid" : "E200F425-30E7-0D97-9B3A-E047A08CE47C", "uguid" : "4455754C-B2A0-7EDA-6387-A50F0228DE7F", "url" : "http://shanghai.haozu.com/listing/pudong/?from=in_area", "referer" : "http://shanghai.haozu.com/", "site" : "haozu", "stamp" : "1345691212948", "cip" : "116.231.123.184", "sessid" : "B1197AA0-976C-F6EF-BB6F-9401D8E983DD", "cid" : "11", "cstamp" : "1345691178421", "cstparam" : "{\"found\":\"37695\",\"proids\":\"10290023|10353348|8448223|10310737|10311720|10250125|10320886|8507299|10332158|10341287|10266002|10322302|9185878|10273552|10272872|10282252|10270250|10336122|9350169|10196350|8533446|10250019|10335617|10222489\"}", "rfpn" : "Home_Index8Page", "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; 360SE; 360SE)" }

The house id is saved in the cstparam field, which is a string. Therefore, you need to perform regular matching and then retrieve it for statistics.
Therefore, the corresponding map and reduce statement is as follows:

Map Method:

var m=function () {    var arr = this.cstparam.split("\"");    var str_ids = arr[arr.length - 2];    var arr_ids = str_ids.split("|");    for (var i in arr_ids) {        emit(arr_ids[i], 1);    }}

Reduce method:

var reduce=function (key, emits) {    var count = 0;    for (var i in emits) {        count += emits[i];    }    return count;}

Run:

db.log_soj.mapReduce(map,reduce,{out:'result_tmp',query:{'cstparam':{'$exists':true},'cstparam':/proids/}});

Returned results:

{    "result" : "result_tmp",    "timeMillis" : 18888,    "counts" : {        "input" : 15742,        "emit" : 333011,        "reduce" : 103137,        "output" : 150897    },    "ok" : 1,}

Result set:

{ "_id" : "10000003", "value" : 1 }{ "_id" : "10000016", "value" : 2 }{ "_id" : "10000032", "value" : 1 }{ "_id" : "10000039", "value" : 1 }{ "_id" : "10000043", "value" : 1 }{ "_id" : "10000059", "value" : 1 }

 

Another example is similar to the previous example. However, statistics are collected based on the city where the house is located.

Map function:

function () {    var arr = this.cstparam.split("\"");    var str_ids = arr[arr.length - 2];    var arr_ids = str_ids.split("|");    for (var i in arr_ids) {        var key = arr_ids[i] + "_" + this.cid;        emit(key, {prop_id:arr_ids[i], city_id:this.cid, count:1});    }}

Reduce function:

function (key, emits) {    var total = 0;    for (var i in emits) {        total += emits[i].count;    }    return {prop_id:emits[0].prop_id, city_id:emits[0].city_id, count:total};}

Run:

db.log_soj.mapReduce(m1,r1,{out:'result_tmp',query:{'cstparam':{'$exists':true},'cstparam':/proids/}});

Result:

{ "_id" : "10000003_undefined", "value" : { "prop_id" : "10000003", "city_id" : null, "count" : 1 } }{ "_id" : "10000016_14", "value" : { "prop_id" : "10000016", "city_id" : "14", "count" : 2 } }{ "_id" : "10000032_15", "value" : { "prop_id" : "10000032", "city_id" : "15", "count" : 1 } }{ "_id" : "10000039_15", "value" : { "prop_id" : "10000039", "city_id" : "15", "count" : 1 } }{ "_id" : "10000043_11", "value" : { "prop_id" : "10000043", "city_id" : "11", "count" : 1 } }{ "_id" : "10000059_17", "value" : { "prop_id" : "10000059", "city_id" : "17", "count" : 1 } }{ "_id" : "10000068_11", "value" : { "prop_id" : "10000068", "city_id" : "11", "count" : 1 } }{ "_id" : "10000099_15", "value" : { "prop_id" : "10000099", "city_id" : "15", "count" : 1 } }{ "_id" : "10000100_18", "value" : { "prop_id" : "10000100", "city_id" : "18", "count" : 1 } }{ "_id" : "10000106_14", "value" : { "prop_id" : "10000106", "city_id" : "14", "count" : 1 } }{ "_id" : "10000109_18", "value" : { "prop_id" : "10000109", "city_id" : "18", "count" : 3 } }{ "_id" : "10000112_15", "value" : { "prop_id" : "10000112", "city_id" : "15", "count" : 1 } }{ "_id" : "10000118_15", "value" : { "prop_id" : "10000118", "city_id" : "15", "count" : 1 } }{ "_id" : "10000156_11", "value" : { "prop_id" : "10000156", "city_id" : "11", "count" : 1 } }{ "_id" : "10000224_14", "value" : { "prop_id" : "10000224", "city_id" : "14", "count" : 1 } }{ "_id" : "10000250_22", "value" : { "prop_id" : "10000250", "city_id" : "22", "count" : 1 } }{ "_id" : "10000262_25", "value" : { "prop_id" : "10000262", "city_id" : "25", "count" : 1 } }{ "_id" : "10000267_14", "value" : { "prop_id" : "10000267", "city_id" : "14", "count" : 3 } }{ "_id" : "10000305_14", "value" : { "prop_id" : "10000305", "city_id" : "14", "count" : 3 } }{ "_id" : "10000323_11", "value" : { "prop_id" : "10000323", "city_id" : "11", "count" : 1 } }

Reprinted please indicate the source:

Http://www.cnblogs.com/xiazh/archive/2012/09/05/2671730.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.