How MongoDB optimizes acceleration through Map-Reduce

Source: Internet
Author: User
Tags foreach emit mongodb sort

First, let's talk about the principle of Map-Reduce.

For the basic principles of Map-Reduce, see the following figure:



For the entire data processing process, see the official figure above. First, Query the data to be processed, map the data in the Query, and then reduce the data in the map.

After a brief understanding, let's take an example to familiarize ourselves with the entire process:

The basic data format is:

/* 0 */
{
"Code": "",
"Uid": "id_1 ",
"Count": 1
}
/* 1 */
{
"Code": "",
"Uid": "id_1 ",
"Count": 1
}
/* 2 */
{
"Code": "B ",
"Uid": "id_1 ",
"Count": 1
}
/* 3 */
{
"Code": "B ",
"Uid": "id_2 ",
"Count": 2
}

Purpose: calculate the sum of count based on the uid and the code involved in the sum.

The Map and Reduce functions can be written soon:

Var map = function (){
Emit (this. uid, {"code": this. code | "", count: this. count | 1 });
};

Var reduce = function (key, values ){
Var result = {code :{}, count: 0 };
Values. forEach (function (val ){
Result. code [val. code] = 1;
Result. count + = val. count;
});
Return result;
}

`

Result:

/* 0 */
{
"_ Id": "id_1 ",
"Value ":{
"Code ":{
"A": 1,
"B": 1
},
"Count": 3
}
}
/* 1 */
{
"_ Id": "id_2 ",
"Value ":{
"Code": "B ",
"Count": 2
}
}

This time, we saved the query process and directly implemented Map and Reduce. Let's take a look at the process:

First, MongoDB scans the entire data table (saving the Query) to traverse all the documents. For each docuemnt, map is stored based on the key (uid.

Next, MongoDB checks the record size (mongod checks every 100 records that the size of the map is not over 50KB, if so it runs reduce on ALL current keys. if size of map is still over kB, it dumps all current privileges ents to disk in an "incremental" collection .)

Finally, reduce operations are performed based on map data.

Well, the above three points are the approximate process. For the Mapping process


{"Id_1", values: [{"code": "A", "count": 1 },{ "code": "A", "count": 1 }, {"code": "B", "count": 1 }}

This is an Emit operation. You need to pay attention to this. And then pass it to Reduce for processing in this way, so Reduce must perform forEach processing on values.

Through the above process, pay attention to the following points: If there are many documents, and the distribution of these documents is very random, when the memory is relatively small, mongoDB will store the data in an inc auto-increment document.

For example, I have three keys, A, B, and C. Each key has 100, but these keys are randomly distributed, such as... B... A... B. C... A .. B .. C .. when I need to perform Emit on A first, I need to obtain all the documents with keys of A. In this process, when the memory is very small, I need to store most of the documents on the disk. then the memory and disk continue to exchange data until All A is identified (during this period, each part of A operated in the memory will first be Emit ).

This operation is certainly time-consuming. If we index the key and sort the key, the sorted A will be mostly in the memory, reducing the number of switching between memory and disk.

Therefore, it is necessary to sort Big Data. This detail can at least increase the processing speed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.