MongoDB MapReduce Usage

Source: Internet
Author: User
Tags emit

about MongoDB's MapReduceCategory: MongoDB2012-12-06 21:378676 People read Comments (2) favorite reports MongoDB Mapreducemapreduce is a computational model that simply executes a large amount of work (data) decomposition (MAP) and then merges the results into the final result (REDUCE). The advantage of this is that after the task is decomposed, it can be computed in parallel by a large number of machines, reducing the time of the whole operation.

Above is the theoretical part of MapReduce, the following is the actual application, the following is the example of MongoDB MapReduce illustrates.

Here is an example of MongoDB official:

> Db.things.insert ({_id:1, Tags: [' dog ', ' Cat ']});> Db.things.insert ({_id:2, Tags: [' cat ']});> db.t Hings.insert ({_id:3, Tags: [' mouse ', ' cat ', ' Dog ']});> Db.things.insert ({_id:4, Tags: []  });>//MA P function> map = function () {...    This.tags.forEach (        ... function (z) {...            Emit (Z, {count:1});..        } ...    ); ...}; >/reduce function> reduce = function (key, values) {...    var total = 0;    .. for (var i=0; i<values.length; i++)        ... Total + = Values[i].count, ...    . return {count:total};..}; Db.things.mapReduce (map,reduce,{out: ' tmp '}) {    "result": "tmp",    "Timemillis": "The", "    counts": {        "Input": 4,        "emit": 6,        "Output": 3    },    "OK": 1,}> Db.tmp.find () {"_id": "Cat", "value": {"cou NT ": 3}} {" _id ":" Dog "," value ": {" Count ": 2}} {" _id ":" Mouse "," value ": {" Count ": 1}}

The example is simple and calculates the number of occurrences of each label in a label system.

In this, in addition to the emit function, all is the standard JS syntax, this emit function is very important, it can be understood that when all the documents need to be computed (because the document can be filtered, as in MapReduce, and then the map function is done), The map function returns the key_values pair, and key is the first parameter in emit key,values is an array of n second arguments for the emit of the same key. This key_values is passed as a parameter to reduce, respectively, as the first 1.2 parameters.

The task of the reduce function is to turn key-values into Key-value, which is to turn the values array into a single value. When the values array in the key-values is too large, it is then sliced into many smaller key-values blocks, then performing the reduce function separately, then combining the results of multiple blocks into a new array, as the second parameter of the reduce function, to continue the reducer operation. It can be foreseen that if our initial values were very large, we might also reduce the set of the first block calculation after it was made. This is similar to the multi-order merge sort. How much will it weigh, just the amount of data.

Reduce must be able to be called repeatedly, whether it is a mapping link or a previous simplification link. So the document returned by reduce must be able to act as an element of the second parameter of reduce.

(When writing the map function, the second parameter of the emit consists of the second parameter of the reduce function, and the return value of the reduce function is consistent with the second parameter of the emit function. The return value of multiple reduce functions may be made into an array as the new second input parameter to perform the reduce operation again. )

the parameter list for the MapReduce function is as follows :

Db.runcommand ({mapreduce: <collection>,   map: <mapfunction>,   reduce: <reducefunction>   [, Query: <query filter object>]   [, Sort: <sort the query.  Useful for Optimization>]   [, limit: <number of objects to return from Collection>]   [, Out: <output-c Ollection Name>]   [, keeptemp: <true|false>]   [, Finalize: <finalizefunction>]   [, Scope: <object where fields go into JavaScript global scope;]   [, Verbose:true]});
or write this:

Db.Collection.Mapreduce(<Map>,<Reduce>{<out<query><sort> < limit> < finalize><scope ><jsmode> <verbose> } )               /span>                
    • MapReduce: Specifies the collection to be processed for MapReduce
    • Map:map function
    • Reduce:reduce function
    • Out: The name of the collection that outputs the result, does not specify a collection that creates a random name by default (if you use the Out option, you do not have to specify keeptemp:true because it is already hidden)
    • Query: A filter condition in which only documents that meet the criteria call the map function. (Query. Limit,sort can be combined freely)
    • Sort: Sort parameters combined with limit (also sort documents before they are sent to the map function) to optimize the grouping mechanism
    • Limit: The upper limit of the number of documents sent to the map function (if no limit is used, using sort alone is not very useful)
    • Keytemp:true or FALSE, indicating whether the output to the collection is temporary, if you want to keep the collection after the connection is closed, specify keeptemp to True if you are using a MongoDB MONGO client connection, It must exit before it is deleted. If the script is executed, the script exits or calls close to automatically delete the result collection
    • Finalize: is a function that performs a calculation of key and value once after executing the map, reduce, and returns a final result, which is the last step in the process, so finalize is the right time to calculate the average, trim the array, and clear the excess information
    • Scope:javascript the variables to be used in the code, the variables defined here are visible in the Map,reduce,finalize function
    • VERBOSE: Verbose output option for debugging, you can set it to true if you want to see the mpareduce running process. You can also print out the information in the map,reduce,finalize process to the server log.

the document structure returned by the Execute MapReduce function is as follows :

{result: <collection_name>

Timemillis: <job_time>

Counts: {

Input: <number of objects Scanned>

Emit: <number of times emit was called>

Output: <number of items in output collection>

} ,

OK: <1_if_ok>

[, err: <errmsg_if_error>]

}

    • Result: The name of the collection that stores the result, which is a temporary collection that is automatically deleted when the MapReduce connection is closed.
    • Timemillis: Time spent in execution, in milliseconds
    • Input: The number of documents that satisfy the condition being sent to the map function
    • Emit: The number of times the emit is called in the map function, that is, the total amount of data in all the collections
    • Ouput: Number of documents in the result collection (count is helpful for debugging)
    • OK: Successful, success is 1
    • ERR: If it fails, there can be a reason for failure here, but from experience, the reason is rather vague and less useful.

Java code executes the MapReduce method:

[Java]View Plaincopyprint?
  1. Public void MapReduce () {
  2. Mongo Mongo = new Mongo ("localhost",27017);
  3. DB db = Mongo.getdb ("qimiguangdb");
  4. Dbcollection coll = db.getcollection ("Collection1");
  5. String map = "function () {emit (THIS.name, {count:1});}";
  6. String reduce = "function (key, values) {";
  7. Reduce=reduce+"var total = 0;";
  8. Reduce=reduce+"for (Var i=0;i<values.length;i++) {total + = Values[i].count;}";
  9. Reduce=reduce+"return {count:total};}";
  10. String result = "resultcollection";
  11. Mapreduceoutput mapreduceoutput = coll.mapreduce (map,
  12. Reduce.tostring (), result, null);
  13. Dbcollection Resultcoll = Mapreduceoutput.getoutputcollection ();
  14. Dbcursor cursor= Resultcoll.find ();
  15. While (Cursor.hasnext ()) {
  16. System.out.println (Cursor.next ());
  17. }
  18. }

MongoDB MapReduce Usage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.