MongoDB mapreduce instance

Source: Internet
Author: User
Tags emit

The example below is a test in small data. I tried to test tens of millions of data records on a single machine. I haven't finished the test for a long time...

 

Data has a table: crawler. Videos. The table structure is: _ id, playurl, siteid...

Only _ id is indexed, and the values of siteid are different websites. Values: 1, 2, and 3.

Count the number of IDS contained in each website in the database. It is equivalent to the select count (_ id), siteid from videos group by siteid statement in MySQL.

Group by siteid to count the number of IDS contained in each siteid. In Mongo, you can use the group function or mapreduce. Two methods are recorded:

 

Use the group () function

Syntax:
DB. Coll. Group (
{Key: {fieldtogroup: true },
Cond: {condition_where },
Reduce: function (OBJ, Prev) {logical_sentenct ;},
Initial: {initial_values}
});

For the above requirements, you can write as follows:
Use crawler;
DB. Videos. Group (
{Key: {siteid: true },
Reduce: function (OBJ, Prev) {Prev. Count ++ ;},
Initial: {count: 0}
});

Note: The cond domain does not need to be written because there are no restrictions in the requirement. In addition, in reduce, the function can be written outside, for example:
R = function (OBJ, Prev ){
Prev. Count ++ };

DB. Videos. Group (
{Key: {siteid: true },
Reduce: R,
Initial: {count: 0}
});

In MongoDB, function () is equivalent to the Javascript syntax.
Note that in the reduce function, there must be two parameters (OBJ, Prev). obj is the object Currently traversed,
Prev is the aggregate counter [aggregation counter OBJ].

The result of the preceding command is:
[
{
"Siteid": 1,
"Count": 188603
},
{
"Siteid": 3,
"Count": 2198
},
{
"Siteid": 2,
"Count": 210
}
]

Mapreduce:
On the Mongo terminal, enter:
M = function (){
Emit (this. siteid, 1 );
};

R = function (Key, values ){
VaR Total = 0;
For (VAR I = 0; I <values. length; I ++ ){
Total + = values [I];
}
Return total;
};

Res = dB. Videos. mapreduce (M, R );

DB [res. Result]. Find (); // display the result
The above function shows the total number of records for each siteid. The result is as follows:
{"_ Id": 1, "value": 188603}
{"_ Id": 2, "value": 210}
{"_ Id": 3, "value": 2198}

Res = dB. Videos. mapreduce (M, R); the statement can also be written as follows:
Res = dB. runcommand ({
Mapreduce: "videos ",
Map: m,
Reduce: R });
If you forget to pass the DB. runcommand () result to res, you can find it in the print information after running the runcommand command.
"Result": "tmp. MR. mapreduce_1294253572_8", indicates that the result is saved in
In the TMP. MR. mapreduce_1294253572_8 table, write directly on the terminal:
TMP. MR. mapreduce_1294253572_8.find () is displayed.

DB [res. Result]. Drop () to delete the final result.
In the map function, the emit () function needs to pass the result to the reduce function. In emit (Key, value), key is the key of the group,
The value type must be the same as the values type of the reduce function. The map function merges the values of the same key before being passed to the reduce function,
Generate an intermediate result (Key, values) and pass it to the reduce function.
The reduce function processes the intermediate results and saves the final results to the result.

You can add conditions in mapreduce, such as DB. Videos. mapreduce (M, R [, option...]) or according to the standard syntax of the official document:
DB. runcommand (
{Mapreduce: <collection>,
Map: <mapfunction>,
Reduce: <performancefunction>
[, Query: <query filter Object>]
[, Sort: <sort the query. Useful for optimization>]
[, Limit: <number of objects to return from collection>]
[, Out: <output-collection Name>]
[, Outtype: ("normal" | "merge" | "reduce")] -- since 1.7.3
[, Keeptemp: <true | false>]
[, Finalize: <finalizefunction>]
[, Scope: <object where fields go into JavaScript global scope>]
[, Verbose: True]
}


);


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.