MongoDB map-reduce-Mongo shell and C # (on)

Last Update:2018-12-05 Source: Internet

Author: User

Tags mongo shell

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I recently learned MongoDB and saw some interesting information about map-Reduce, So I recorded it here as a learning note.

Here we will refer to the official website and another article on the role of Map-Reduce, Which is concise and concise.

1. Official Website: http://docs.mongodb.org/manual/tutorial/map-reduce-examples/

The map-reduce operation is composed of specified tasks, including:

Reads from the input collection,
Executions ofMapFunction,
Executions ofReduceFunction,
Writes to the output collection.

2. Another article: http://openmymind.net/2011/1/20/Understanding-Map-Reduce/

So what advantage does MAP reduce hold? The oft-cited benefit is that both the map and reduce operations can be distributed. so the code I 've written above cocould be executed by multiple threads, multiple CPUs, or even thousands of servers as-is. this is
Key when dealing with millions and billions of records, or smaller sets with more complex logic. for the rest of us though, I think the real benefit is the power of being able to write these types of Transforms Using actual programming languages, with variables,
Conditional statements, methods and so on. it is a mind shift from the traditional approach, but I do think even slightly complex queries are cleaner and easier to write with map reduce. we didn't look at it here, but you'll commonly feed the output of a reduce
Function into another reduce function-each function further transforming it towards the end-result.

All right, what is said on the official website is what operations and steps map-Reduce takes during execution; another article is about the advantages of Map-Reduce compared with traditional group by operations and his own opinions. Let's take a look at this article.

Next, let me give an example to deepen our understanding of Map-reduce. Considering that we have such a table (a table is better understood than a traditional database, and it is called collection in nosql ), there are fields _ id, cusid, and price (each record of the saved field in nosql is document), and the data in it is as follows:

Input:

======================================

{Cusid: 1, price: 15 };
{Cusid: 2, price: 30 };
{Cusid: 2, price: 45 };
{Cusid: 3, price: 45 };
{Cusid: 4, price: 5 };
{Cusid: 5, price: 65 };
{Cusid: 1, price: 10 };
{Cusid: 1, price: 30 };
{Cusid: 5, price: 30 };
{Cusid: 4, price: 100 };

======================================

However, the data we want to obtain is the sum of price statistics based on cusid, which can be achieved through group by. However, the previous two references show the advantages of Map-reduce, especially in big data, the advantage is obvious, so we use Map-reduce to implement it. The output data is as follows:

Output:

======================================

{Cusid: 1, price: 55 };
{Cusid: 2, price: 75 };
{Cusid: 3, price: 45 };
{Cusid: 4, price: 105 };
{Cusid: 5, price: 95 };

======================================

After the basic requirements are introduced, we will implement them one by one. The first is

I. Mongo shell version:

1. First, write the map function to process every document (in fact, it is to write JS scripts, but it is different ).

In the red box, the above is the record data I inserted. It should be noted that there is an emit function in it, and its function is to make a key-value match, here, the price of each document is matched to the corresponding cusid, which is easy to understand, yes.

2. compile the corresponding reduce function. Here, functio has two parameters: Key-values, right, not key-value. values is an array, which is equivalent to a group operation, all correspond to one cusid, cusid-prices.

The main operation of performancefunction is to sum the price of each different cusid and obtain the result.

3. Execute the map-reduce operation and output the result to a temporary collection.

The red part is the expected result, which is consistent with the output result. There is one question here, that is, when cusid = 3, The result format is different from other ones. It may be because when cusid = 3 has only one record, so it will not perform group-like operations, in short, the performancefunction will not be executed. To verify this guess, We can insert a record with cusid = 3 to see if the result will change.

Facts prove that the guess is correct. The result of cusid = 3 is the same as that of others .~ _~

Coming soon!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More