MongoDB map-reduce-Mongo shell and C # (on)

Source: Internet
Author: User
Tags mongo shell

I recently learned MongoDB and saw some interesting information about map-Reduce, So I recorded it here as a learning note.

Here we will refer to the official website and another article on the role of Map-Reduce, Which is concise and concise.

1. Official Website: http://docs.mongodb.org/manual/tutorial/map-reduce-examples/

The map-reduce operation is composed of specified tasks, including:

  • Reads from the input collection,
  • Executions ofMapFunction,
  • Executions ofReduceFunction,
  • Writes to the output collection.

2. Another article: http://openmymind.net/2011/1/20/Understanding-Map-Reduce/

So what advantage does MAP reduce hold? The oft-cited benefit is that both the map and reduce operations can be distributed. so the code I 've written above cocould be executed by multiple threads, multiple CPUs, or even thousands of servers as-is. this is
Key when dealing with millions and billions of records, or smaller sets with more complex logic. for the rest of us though, I think the real benefit is the power of being able to write these types of Transforms Using actual programming languages, with variables,
Conditional statements, methods and so on. it is a mind shift from the traditional approach, but I do think even slightly complex queries are cleaner and easier to write with map reduce. we didn't look at it here, but you'll commonly feed the output of a reduce
Function into another reduce function-each function further transforming it towards the end-result.

All right, what is said on the official website is what operations and steps map-Reduce takes during execution; another article is about the advantages of Map-Reduce compared with traditional group by operations and his own opinions. Let's take a look at this article.

Next, let me give an example to deepen our understanding of Map-reduce. Considering that we have such a table (a table is better understood than a traditional database, and it is called collection in nosql ), there are fields _ id, cusid, and price (each record of the saved field in nosql is document), and the data in it is as follows:

Input:

======================================

{Cusid: 1, price: 15 };
{Cusid: 2, price: 30 };
{Cusid: 2, price: 45 };
{Cusid: 3, price: 45 };
{Cusid: 4, price: 5 };
{Cusid: 5, price: 65 };
{Cusid: 1, price: 10 };
{Cusid: 1, price: 30 };
{Cusid: 5, price: 30 };
{Cusid: 4, price: 100 };

======================================

However, the data we want to obtain is the sum of price statistics based on cusid, which can be achieved through group by. However, the previous two references show the advantages of Map-reduce, especially in big data, the advantage is obvious, so we use Map-reduce to implement it. The output data is as follows:

Output:

======================================

{Cusid: 1, price: 55 };
{Cusid: 2, price: 75 };
{Cusid: 3, price: 45 };
{Cusid: 4, price: 105 };
{Cusid: 5, price: 95 };

======================================

After the basic requirements are introduced, we will implement them one by one. The first is

I. Mongo shell version:

1. First, write the map function to process every document (in fact, it is to write JS scripts, but it is different ).

In the red box, the above is the record data I inserted. It should be noted that there is an emit function in it, and its function is to make a key-value match, here, the price of each document is matched to the corresponding cusid, which is easy to understand, yes.

2. compile the corresponding reduce function. Here, functio has two parameters: Key-values, right, not key-value. values is an array, which is equivalent to a group operation, all correspond to one cusid, cusid-prices.

The main operation of performancefunction is to sum the price of each different cusid and obtain the result.

3. Execute the map-reduce operation and output the result to a temporary collection.

The red part is the expected result, which is consistent with the output result. There is one question here, that is, when cusid = 3, The result format is different from other ones. It may be because when cusid = 3 has only one record, so it will not perform group-like operations, in short, the performancefunction will not be executed. To verify this guess, We can insert a record with cusid = 3 to see if the result will change.

Facts prove that the guess is correct. The result of cusid = 3 is the same as that of others .~ _~

 

Coming soon!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.