Use of MapReduce in MongoDB

Source: Internet
Author: User
Tags emit
The small partners who have played Hadoop should be no stranger to MapReduce, MapReduce is powerful and flexible, it can divide a big problem into a number of small problems, the small problems sent to different machines to process, all the machines are completed calculation, The results are then combined into a complete solution, which is called distributed computing. In this article we will look at the use of MapReduce in MongoDB.

Mapreduce

MapReduce in MongoDB can be used to implement more complex aggregation commands, using MapReduce to implement two main functions: the map function and the reduce function, the map function is used to generate a key-value pair sequence, the result of the map function as a parameter of the reduce function, The reduce function then makes further statistics, such as my data set as follows:

{"_id": ObjectId ("59fa71d71fd59c3b2cd908d7"), "name": "Lu Xun", "book": "Shout", "Price": 38.0, "publisher": "People's Literature Press"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908d8"), "name": "Cao Xueqin", "book": "Red Mansions", "price": 22.0, "publisher": "People's Literature Press"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908d9"), "name": "Qian Zhongshu", "book": "Song Anthology Note", "Price": 99.0, "publisher": "People's Literature Press"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908da"), "name": "Qian Zhongshu", "book": "About art record", "Price": 66.0, "publisher": "Joint Publishing"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908db"), "name": "Lu Xun", "book": "Wandering", "price": 55.0, "publisher": "Huacheng Press"}

If I want to inquire about the total price of each author's book, Proceed as follows:

var map=function () {emit (This.name,this.price)}var reduce=function (key,value) {return array.sum (value)}var options={ Out: "Totalprice"}db.sang_books.mapreduce (map,reduce,options);d B.totalprice.find ()

Emit function is mainly used to implement grouping, receive two parameters, the first parameter represents the field of grouping, the second parameter represents the data to be counted, reduce to do the concrete data processing operation, receive two parameters, corresponding to the emit method two parameters, The SUM function in the array is used to self-process the price field, and the options define a collection of the output of the result, in which case we will query the data, which, by default, will persist even after the database is restarted, preserving the data in the collection. The query results are as follows:

{    "_id": "Cao Xueqin",    "value": 22.0}{    "_id": "Qian Zhongshu",    "value": 165.0}{    "_id": "Lu Xun",    "value": 93.0}

For example, I would like to query each author for several books, as follows:

var map=function () {emit (this.name,1)}var reduce=function (key,value) {return array.sum (value)}var options={out: " Booknum "}db.sang_books.mapreduce (map,reduce,options);d B.booknum.find ()

The query results are as follows:

{    "_id": "Cao Xueqin",    "value": 1.0}{    "_id": "Qian Zhongshu",    "value": 2.0}{    "_id": "Lu Xun",    "value": 2.0}

List each author's book as follows:

var map=function () {emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}var options={ Out: "Books"}db.sang_books.mapreduce (map,reduce,options);d B.books.find ()

The results are as follows:

{    "_id": "Cao Xueqin",    "value": "The Dream of Red Mansions"} {    "_id": "Qian Zhongshu",    "value": "Song anthology note, about art record"} {    "_id": "Lu Xun",    "value": " Shouting, Wandering "}

For example, for each of the books sold at ¥40 above:

var map=function () {emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}var options={ query:{price:{$gt: 40}},out: "Books"}db.sang_books.mapreduce (map,reduce,options);d B.books.find ()

Query indicates that the found collection is filtered again.

The results are as follows:

{    "_id": "Qian Zhongshu",    "value": "Song anthology note, about art record"} {    "_id": "Lu Xun",    "value": "Wandering"}

RunCommand implementation

We can also use the RunCommand command to execute mapreduce. The format is as follows:

Db.runcommand (               {                 mapReduce: <collection>,                 map: <function>,                 reduce: <function>,                 Finalize: <function>, out                 : <output>,                 query: <document>,                 sort: <document>,                 limit: <number>,                 scope: <document>,                 Jsmode: <boolean>,                 verbose: <boolean                 bypassdocumentvalidation: <boolean>,                 collation: <document>               }             )

The meaning is as follows:

Parameters meaning
Mapreduce Represents the collection to manipulate
Map Map function
Reduce Reduce function
Finalize Final processing function
Out The collection of outputs
Query Filter the results
Sort Sort the results
Limit Number of results returned
Scope Sets the value of the parameter, which is visible in the map, reduce, finalize functions
Jsmode Whether the intermediate data executed by the map is converted from a JavaScript object to a Bson object, false by default
Verbose Whether to display detailed time statistics
Bypassdocumentvalidation Whether to bypass document validation
Collation Some other proofreading

The following operations, which represent the execution of a mapreduce operation and a limit on the number of returned bars for a collection of statistics, limit the number of returned bars before counting operations, as follows:

var map=function () {emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}db.runcommand ( {mapreduce: ' Sang_books ', Map,reduce,out: "Books", Limit:4,verbose:true}) Db.books.find ()

The results of the implementation are as follows:

{    "_id": "Cao Xueqin",    "value": "The Dream of Red Mansions"} {    "_id": "Qian Zhongshu",    "value": "Song anthology note, about art record"} {    "_id": "Lu Xun",    "value": " Shout "}

The little friends saw that Lu Xun had a book missing, because limit is the first to restrict the number of return bars, and then perform statistical operations.

The finalize operation represents the final processing function, as follows:

var f1 = function (key,reducevalue) {var obj={};obj.author=key;obj.books=reducevalue; return Obj}var map=function () { Emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}db.runcommand ({mapreduce: ' Sang_ Books ', Map,reduce,out: "Books", Finalize:f1}) Db.books.find ()

F1 The first parameter key represents the first parameter in emit, the second parameter represents the result of reduce, and we can re-process the result in F1, with the following results:

{    "_id": "Cao Xueqin",    "value": {        "author": "Cao Xueqin",        "books": "Red Mansions"    }}{    "_id": "Qian Zhongshu",    "value": { c7/> "Author": "Qian Zhongshu",        "Books": "Song anthology note, about art record"    }}{    "_id": "Lu Xun",    "value": {        "author": "Lu Xun",        " Books ":" Shout, Wander "    }}

Scope can be used to define a variable that is visible in map, reduce, and finalize, as follows:

var f1 = function (key,reducevalue) {var Obj={};obj.author=key;obj.books=reducevalue;obj.sang=sang; return Obj}var map= function () {emit (This.name,this.book)}var reduce=function (key,value) {return value.join (',--' +sang+ '--, ')} Db.runcommand ({mapreduce: ' Sang_books ', Map,reduce,out: "Books", Finalize:f1,scope:{sang: "Haha"}}) Db.books.find ()

The results of the implementation are as follows:

{    "_id": "Cao Xueqin",    "value": {        "author": "Cao Xueqin",        "books": "Red Mansions",        "sang": "Haha"    }}{"_id"    : "Money Haizhong ",    " value ": {        " author ":" Qian Zhongshu ",        " Books ":" Song anthology note,--haha--, talk art ",        " sang ":" haha "    }}{    " _id " : "Lu Xun",    "value": {        "author": "Lu Xun",        "books": "Shout,--haha--, hesitation",        "sang": "Haha"    }}

After reading this article I hope you have something to gain.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.