Brief introduction of MapReduce in MongoDB _mongodb

Source: Internet
Author: User
Tags emit mongodb

MongoDB MapReduce

MapReduce is a computational model that simply executes a large amount of work (data) decomposition (MAP) and then merges the results into the final result (REDUCE). The advantage of this is that after the task is decomposed, it can be computed in parallel by a large number of machines, reducing the time of the entire operation.

The above is the theoretical part of MapReduce, below the actual application, the following take MongoDB MapReduce as an example to explain.

Here is an example of the official MongoDB:

Copy Code code as follows:

> Db.things.insert ({_id:1, Tags: [' dog ', ' Cat ']});
> Db.things.insert ({_id:2, Tags: [' cat ']});
> Db.things.insert ({_id:3, Tags: [' mouse ', ' cat ', ' Dog ']});
> Db.things.insert ({_id:4, Tags: []});

>//Map function
> map = function () {
... This.tags.forEach (
.. function (z) {
... emit (Z, {count:1});
...        }
...    );
...};

>//Reduce function
> reduce = function (key, values) {
... var total = 0;
... for (var i=0; i<values.length; i++)
. Total + = Values[i].count;
... return {count:total};
...};

Db.things.mapReduce (map,reduce,{out: ' tmp '})
{
"Result": "TMP",
"Timemillis": 316,
"Counts": {
"Input": 4,
"Emit": 6,
"Output": 3
},
"OK": 1,
}
> Db.tmp.find ()
{"_id": "Cat", "value": {"Count": 3}}
{' _id ': ' Dog ', ' value ': {' count ': 2}}
{' _id ': ' Mouse ', ' value ': {' count ': 1}}

The example is simple, calculating the number of times each label appears in a label system.

In this case, in addition to the emit function, all are standard JS syntax, this emit function is very important, you can understand that when all the documents that need to be computed (because in MapReduce, you can filter the document, and then we will talk about) executed the map function, The map function returns the key_values pair, which is the first argument in the emit key,values is an array of n second parameters corresponding to the emit of the same key. This key_values is passed as a parameter to reduce, respectively, as the first 1.2 parameter.

The task of the reduce function is to turn key-values into Key-value, which is to change the array of values into a single value. When the values array in Key-values is too large, it is sliced into a number of small key-values blocks, then the reduce function is executed, and the results of multiple blocks are combined into a new array as the second argument of the reduce function to continue the reducer operation. It can be foreseen that if our initial values are very large, we may also be able to reduce the collection of the first block computations again. This is similar to a multiple-order merge sort. The concrete will have how many heavy, depends on the data quantity.

Reduce must be able to be called repeatedly, whether it is a mapping link or a previous simplification. So reduce returns a document that must be able to act as an element of the second parameter of reduce.

(When writing the map function, the second parameter composition of the emit forms the second argument of the reduce function, and the return value of the reduce function is the same as the second argument of the emit function. The return value of multiple reduce functions may form an array to perform the reduce operation again as the new second input parameter. )

The argument list for the MapReduce function is as follows:

Copy Code code as follows:

Db.runcommand (
{mapreduce: <collection>
Map: <mapfunction>,
Reduce: <reducefunction>
[, Query: <query filter object>]
[, Sort: <sort the query. Useful for Optimization>]
[, Limit: <number of objects to return from Collection>]
[, Out: <output-collection Name>]
[, Keeptemp: <true|false>]
[, Finalize: <finalizefunction>]
[, Scope: <object where fields go into JavaScript global scope]
[, Verbose:true]
}
);

Or so to write:
Copy Code code as follows:

Db.collection.mapReduce (
<map>,
<reduce>,
{
<out>,
<query>,
<sort>,
<limit>,
<keytemp>,
<finalize>,
<scope>,
<jsmode>,
<verbose>
}
)

1.mapreduce: Specifies the collection to be processed MapReduce
2.map:map function
3.reduce:reduce function
4.out: The name of the output collection, does not specify the default to create a random name of the collection (if you use the Out option, you do not have to specify the keeptemp:true, because already implied in the)
5.query: A filter condition in which the map function is called only if the document satisfies the condition. (Query. Limit,sort can be combined freely)
6.sort: Sort sort parameters combined with limit (also sorting documents before sending to the map function) to optimize the grouping mechanism
7.limit: The upper limit of the number of documents sent to the map function (the use of sort alone is not useful if there is no limit)
8.keytemp:true or false to indicate whether the result output to the collection is temporary, and if you want to keep the collection after the connection is closed, specify Keeptemp as True, if you are using a MongoDB MONGO client connection, It must be exit before it is deleted. If the script executes, the script exits or calls close automatically deletes the result collection
9.finalize: A function that calculates the key and value after the map, reduce, and returns a final result, which is the final step in the process, so finalize is a good time to compute the average, trim the array, and erase the extra information.
10.scope:javascript the variable to be used in the code, where the variable defined here is visible in the Map,reduce,finalize function
11.verbose: The verbose output option for debugging, you can set it to true if you want to see the mpareduce running process. You can also print the information in the map,reduce,finalize process to the server log.

The document structure returned by performing the MapReduce function is as follows:

Copy Code code as follows:

{result: <collection_name>

Timemillis: <job_time>

Counts: {

Input: <number of objects Scanned>

Emit: <number of times emit was called>

Output: <number of items in output collection>

} ,

OK: <1_if_ok>

[, err: <errmsg_if_error>]

}

1.result: The name of the collection that stores the result, which is a temporary collection and is automatically deleted after the MapReduce connection is closed.
2.timeMillis: Time spent in execution, in milliseconds
3.input: The number of documents that meet the criteria to be sent to the map function
4.emit: Emit the number of calls in the map function, that is, the total amount of data in all collections
5.ouput: Number of documents in the result collection (count is very helpful for debugging)
6.ok: Success, 1 success
7.err: If it fails, there can be failure, but from experience, the reason is rather vague and not very useful.

Java code executes the MapReduce method:

Copy Code code as follows:

public void MapReduce () {
Mongo Mongo = new Mongo ("localhost", 27017);
DB db = Mongo.getdb ("Qimiguangdb");
Dbcollection coll = db.getcollection ("Collection1");

String map = "function () {emit (THIS.name, {count:1});}";


String reduce = "function (key, values) {";
Reduce=reduce+ "var total = 0;";
Reduce=reduce+ "for (Var i=0;i<values.length;i++) {total = Values[i].count;}";
Reduce=reduce+ "return {count:total};}";

String result = "Resultcollection";

Mapreduceoutput mapreduceoutput = coll.mapreduce (map,
Reduce.tostring (), result, NULL);
Dbcollection Resultcoll = Mapreduceoutput.getoutputcollection ();
Dbcursor cursor= Resultcoll.find ();
while (Cursor.hasnext ()) {
System.out.println (Cursor.next ());
}
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.