Mongodb mapreduce usage summary, mongodbmapreduce

Source: Internet
Author: User
Tags emit

Mongodb mapreduce usage summary, mongodbmapreduce



This article is from my blog: mongodb mapreduce usage Summary 

As we all know, mongodb is a non-relational database. That is to say, each table in the mongodb database exists independently and there is no dependency between the table and the table. In mongodb, apart from various CRUD statements, we also provide the aggregation and mapreduce statistics functions. This article mainly talks about the mapreduce operations of mongodb.

I will not repeat the concept of mapreduce. Let's look at it by yourself.

In mongodb, The mapreduce syntax is as follows:

Db. table. mapReduce (map, reduce, {query: query, out: out, // specifies how the result set is stored. Optional parameters include: // replace: if the document (table) exists, replace table, // merge: if there is a record in the document, overwrite the existing document record. // reduce: if there is a record with the same key in the document, calculate two records first, then overwrite the old record // {inline: 1} and store the records in the memory. Do not write the records to the disk (Calculation with a small amount of user data) sort: sort, limit: limit, finalize: function // This function is mainly used to modify data before it is stored in the out, function (key, values) {// return modifiedValues;} scope: document, // specify the range of files that can be accessed by reduce: jsMode: boolean // specify whether the data is immediately converted to the Bason format between map and ruduce. The default value is false. // if you want to set the value to true, remember the official notes: // You can only use jsMode for result sets with fewer than // 500,000 distinct key arguments to the mapper's emit () function. verbose: boolean // whether to include timing information in the result set, which is included by default })

Make sure that you can use indexes for your query when using mongodb mapreduce. Otherwise, the whole database will be praised under the statistics of large data volumes. If you cannot create an index, in this case, the query is removed from the result set to determine data that does not meet the conditions.

Mapreduce syntax is actually very simple, but there are several points to note:

1. In map, mongodb reduces every 1000 pieces of data.

2. In map, if you want to calculate the sum of a data, you need to write it as follows:

Emit (key: this. key, sum: 0 })

Then we need to accumulate the previous sum iteration in reduce, and then return {sum: sum}; if not, the data you calculated is always counted after less than 1000 data records, and the previous data is lost.

3. If you don't need mapreduce, you don't need it. If the program can be used for statistics, you don't need to use mongodb for frequent statistics.

4. the data format of the mapreduce result set is: {_ id: key, value: {}. Therefore, if you want to directly use this table, you 'd better organize the data format again, try to put the data to the last time instead of using value. xxx to query.

The following is a mapreduce program that counts the number of content published by users on our website. It is only for reference in a code format:

Var db = connect ('2017. 0.0.1: 27017/test'); db. aAccounttemp. drop (); var map = function () {emit (this. accountId, {sum: 0, reblogFlag: this. reblogFlag, dashboardFlag: this. dashboardFlag, dashboardType: this. dashboardType, photoNum: 0, postNum: 0, reblogNum: 0, videoNum: 0, video1_num: 0, musicNum: 0, questionNum: 0, appNum: 0, dialogNum: 0}) ;}var reduce = function (key, values) {var sum = 0; var photoNum = 0; var postNum = 0; var reblogNum = 0; var videoNum = 0; var video1_num = 0; var musicNum = 0; var questionNum = 0; var appNum = 0; var dialogNum = 0; for (var I = 0; I <values. length; I ++) {var data = values [I]; var reblogFlag = data. reblogFlag; var dashboardFlag = data. dashboardFlag; var dashboardType = data. dashboardType; sum + = data. sum; photoNum + = data. photoNum; reblogNum + = data. reblogNum; postNum + = Data. postNum; videoNum + = data. videoNum; musicNum + = data. musicNum; video1_num + = data. videoShortNum; questionNum + = data. questionNum; appNum + = data. appNum; dialogNum + = data. dialogNum; if (! ReblogFlag) {if (dashboardFlag) {sum + = 1; if (dashboardType = 10) {postNum + = 1;} else if (dashboardType = 20) {photoNum + = 1;} else if (dashboardType = 30) {videoNum + = 1;} else if (dashboardType = 31) {videow.num + = 1 ;} else if (dashboardType = 40) {musicNum + = 1;} else if (dashboardType = 60) {questionNum + = 1;} else if (dashboardType = 100) {appNum + = 1;} else if (dashboardType = 91) {dialogNum + = 1 ;}} else {if (dashboardType = 20) {photoNum + = 1 ;}} else if (reblogFlag & dashboardFlag) {reblogNum + = 1 ;}}return {sum: NumberInt (sum), reblogNum: numberInt (numeric), postNum: NumberInt (postNum), photoNum: NumberInt (photoNum), videoNum: NumberInt (videoNum), videow.num: NumberInt (videow.num), musicNum: NumberInt (musicNum ), questionNum: NumberInt (questionNum), appNum: NumberInt (appNum), dialogNum: NumberInt (dialogNum) };}; db. getMongo (). setSlaveOk (); db. dashboard_basic.mapReduce (map, reduce, {out: {merge: 'aaccounttemp '}); var results = db. aAccounttemp. find (); // refresh the data format and save it to the regular table while (results. hasNext () {var obj = results. next (); var value = obj. value; var sum = NumberInt (value. sum); var reblogNum = NumberInt (value. reblogNum); var postNum = NumberInt (value. postNum); var photoNum = NumberInt (value. photoNum); var videoNum = NumberInt (value. videoNum); var videow.num = NumberInt (value. videow.num); var musicNum = NumberInt (value. musicNum); var questionNum = NumberInt (value. questionNum); var appNum = NumberInt (value. appNum); var dialogNum = NumberInt (value. dialogNum); var accountId = obj. _ id; db. sums ({accountId: accountId, sum: sum, reblogNum: sums, postNum: postNum, photoNum: photoNum, video1_num: video1_num, videoNum: videoNum, musicNum: musicNum, questionNum: questionNum, appNum: appNum, dialogNum: dialogNum});} print ('success insert total' + results. count () + 'datas'); db. aAccounttemp. drop () quit ()



Who knows about mongodb's mapreduce?

Map: it can be understood as the data to be filled. In SQL, it is like the portion of the where condition to be filtered;
Reduce: it can be understood as the field to be displayed;

Because mapreduce is very difficult for beginners to understand. We recommend that you start with the simple group method;

In addition, the performance of MapReduce is very low. Unless the background statistics are performed, do not use MapReduce or query it as the front-end data access method.

Which of the following processing methods is optimal for mongodb multi-Table Association?

Game logs include user registration and user logon. Use mapreduce to collect user registration information to a collection of user_register, deduplicate the user login information and place it in another collection of user_login. Now you need to associate the two sets with the user name to collect some data. However, I found a lot of information and did not find a good solution for mongodb in this regard. I also thought about using mapreduce to solve this problem. However, according to my experience in using mapreduce during this time, it seems that mapreduce can only process one set and cannot process two sets at the same time. One solution I have come up with is to read all the data in these two sets and then use the program code for processing. Although this method can solve the problem temporarily, it is certainly not the best. So I took the liberty to send you this message to see if you can give some reasonable suggestions or methods. Thank you !!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.