The small partners who have played Hadoop should be no stranger to MapReduce, MapReduce is powerful and flexible, it can divide a big problem into a number of small problems, the small problems sent to different machines to process, all the machines are completed calculation, The results are then combined into a complete solution, which is called distributed computing. In this article we will look at the use of MapReduce in MongoDB.
Mapreduce
MapReduce in MongoDB can be used to implement more complex aggregation commands, using MapReduce to implement two main functions: the map function and the reduce function, the map function is used to generate a key-value pair sequence, the result of the map function as a parameter of the reduce function, The reduce function then makes further statistics, such as my data set as follows:
{"_id": ObjectId ("59fa71d71fd59c3b2cd908d7"), "name": "Lu Xun", "book": "Shout", "Price": 38.0, "publisher": "People's Literature Press"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908d8"), "name": "Cao Xueqin", "book": "Red Mansions", "price": 22.0, "publisher": "People's Literature Press"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908d9"), "name": "Qian Zhongshu", "book": "Song Anthology Note", "Price": 99.0, "publisher": "People's Literature Press"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908da"), "name": "Qian Zhongshu", "book": "About art record", "Price": 66.0, "publisher": "Joint Publishing"} {"_id": ObjectId ("59fa71d71fd59c3b2cd908db"), "name": "Lu Xun", "book": "Wandering", "price": 55.0, "publisher": "Huacheng Press"}
If I want to inquire about the total price of each author's book, Proceed as follows:
var map=function () {emit (This.name,this.price)}var reduce=function (key,value) {return array.sum (value)}var options={ Out: "Totalprice"}db.sang_books.mapreduce (map,reduce,options);d B.totalprice.find ()
Emit function is mainly used to implement grouping, receive two parameters, the first parameter represents the field of grouping, the second parameter represents the data to be counted, reduce to do the concrete data processing operation, receive two parameters, corresponding to the emit method two parameters, The SUM function in the array is used to self-process the price field, and the options define a collection of the output of the result, in which case we will query the data, which, by default, will persist even after the database is restarted, preserving the data in the collection. The query results are as follows:
{ "_id": "Cao Xueqin", "value": 22.0}{ "_id": "Qian Zhongshu", "value": 165.0}{ "_id": "Lu Xun", "value": 93.0}
For example, I would like to query each author for several books, as follows:
var map=function () {emit (this.name,1)}var reduce=function (key,value) {return array.sum (value)}var options={out: " Booknum "}db.sang_books.mapreduce (map,reduce,options);d B.booknum.find ()
The query results are as follows:
{ "_id": "Cao Xueqin", "value": 1.0}{ "_id": "Qian Zhongshu", "value": 2.0}{ "_id": "Lu Xun", "value": 2.0}
List each author's book as follows:
var map=function () {emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}var options={ Out: "Books"}db.sang_books.mapreduce (map,reduce,options);d B.books.find ()
The results are as follows:
{ "_id": "Cao Xueqin", "value": "The Dream of Red Mansions"} { "_id": "Qian Zhongshu", "value": "Song anthology note, about art record"} { "_id": "Lu Xun", "value": " Shouting, Wandering "}
For example, for each of the books sold at ¥40 above:
var map=function () {emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}var options={ query:{price:{$gt: 40}},out: "Books"}db.sang_books.mapreduce (map,reduce,options);d B.books.find ()
Query indicates that the found collection is filtered again.
The results are as follows:
{ "_id": "Qian Zhongshu", "value": "Song anthology note, about art record"} { "_id": "Lu Xun", "value": "Wandering"}
RunCommand implementation
We can also use the RunCommand command to execute mapreduce. The format is as follows:
Db.runcommand ( { mapReduce: <collection>, map: <function>, reduce: <function>, Finalize: <function>, out : <output>, query: <document>, sort: <document>, limit: <number>, scope: <document>, Jsmode: <boolean>, verbose: <boolean bypassdocumentvalidation: <boolean>, collation: <document> } )
The meaning is as follows:
Parameters |
meaning |
Mapreduce |
Represents the collection to manipulate |
Map |
Map function |
Reduce |
Reduce function |
Finalize |
Final processing function |
Out |
The collection of outputs |
Query |
Filter the results |
Sort |
Sort the results |
Limit |
Number of results returned |
Scope |
Sets the value of the parameter, which is visible in the map, reduce, finalize functions |
Jsmode |
Whether the intermediate data executed by the map is converted from a JavaScript object to a Bson object, false by default |
Verbose |
Whether to display detailed time statistics |
Bypassdocumentvalidation |
Whether to bypass document validation |
Collation |
Some other proofreading |
The following operations, which represent the execution of a mapreduce operation and a limit on the number of returned bars for a collection of statistics, limit the number of returned bars before counting operations, as follows:
var map=function () {emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}db.runcommand ( {mapreduce: ' Sang_books ', Map,reduce,out: "Books", Limit:4,verbose:true}) Db.books.find ()
The results of the implementation are as follows:
{ "_id": "Cao Xueqin", "value": "The Dream of Red Mansions"} { "_id": "Qian Zhongshu", "value": "Song anthology note, about art record"} { "_id": "Lu Xun", "value": " Shout "}
The little friends saw that Lu Xun had a book missing, because limit is the first to restrict the number of return bars, and then perform statistical operations.
The finalize operation represents the final processing function, as follows:
var f1 = function (key,reducevalue) {var obj={};obj.author=key;obj.books=reducevalue; return Obj}var map=function () { Emit (This.name,this.book)}var reduce=function (key,value) {return Value.join (', ')}db.runcommand ({mapreduce: ' Sang_ Books ', Map,reduce,out: "Books", Finalize:f1}) Db.books.find ()
F1 The first parameter key represents the first parameter in emit, the second parameter represents the result of reduce, and we can re-process the result in F1, with the following results:
{ "_id": "Cao Xueqin", "value": { "author": "Cao Xueqin", "books": "Red Mansions" }}{ "_id": "Qian Zhongshu", "value": { c7/> "Author": "Qian Zhongshu", "Books": "Song anthology note, about art record" }}{ "_id": "Lu Xun", "value": { "author": "Lu Xun", " Books ":" Shout, Wander " }}
Scope can be used to define a variable that is visible in map, reduce, and finalize, as follows:
var f1 = function (key,reducevalue) {var Obj={};obj.author=key;obj.books=reducevalue;obj.sang=sang; return Obj}var map= function () {emit (This.name,this.book)}var reduce=function (key,value) {return value.join (',--' +sang+ '--, ')} Db.runcommand ({mapreduce: ' Sang_books ', Map,reduce,out: "Books", Finalize:f1,scope:{sang: "Haha"}}) Db.books.find ()
The results of the implementation are as follows:
{ "_id": "Cao Xueqin", "value": { "author": "Cao Xueqin", "books": "Red Mansions", "sang": "Haha" }}{"_id" : "Money Haizhong ", " value ": { " author ":" Qian Zhongshu ", " Books ":" Song anthology note,--haha--, talk art ", " sang ":" haha " }}{ " _id " : "Lu Xun", "value": { "author": "Lu Xun", "books": "Shout,--haha--, hesitation", "sang": "Haha" }}
After reading this article I hope you have something to gain.