MapReduce parallel query based on MongoDB distributed storage

Last Update:2018-12-08 Source: Internet

Author: User

Tags emit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous article introduced how to conduct distributed storage of Relational Data Based on Mongodb. With storage, queries will be involved. Although it can be queried in a common way, we will introduce how to use the MapReduce function provided in MONGODB for query today.
I have written an article about MongoDb MapReduce before,

Today we will introduce how to perform mapreduce query based on the sharding mechanism. In the official documents of MongoDB, the following sentence is used:

Sharded Environments
In sharded environments, data processing of map/reduce operations runs in parallel on all shards.

That is, the map/reduce operation runs on all shards in parallel.
The following describes how to construct a mapreduce query using the environment set up in the previous article:

First of all, the sharding-based mapreduce and non-sharding data have some differences in the return structure. Currently, I have noticed that the custom json format is not supported for returned data, that is, the following method may cause problems:

Return {count: total };

Note: The above situation is currently found in my test environment, for example:

You need to change it to return count;

The following is the test code. First, query the corresponding quantity by post id (query instances by Group ):

Public partial class getfile: System. Web. UI. Page
{

Public Mongo {get; set ;}

Public IMongoDatabase DB
{
Get
{
Return this. Mongo ["dnt_mongodb"];
}
}

/// <Summary>
/// Sets up the test environment. You can either override this OnInit to add initim initialization.
/// </Summary>
Public virtual void Init ()
{
String ConnectionString = "Server = 10.0.4.85: 27017; ConnectTimeout = 30000; ConnectionLifetime = 300000; MinimumPoolSize = 512; MaximumPoolSize = 51200; Pooled = true ";
If (String. IsNullOrEmpty (ConnectionString ))
Throw new ArgumentNullException ("Connection string not found .");
This. Mongo = new Mongo (ConnectionString );
This. Mongo. Connect ();
}
String mapfunction = "function () {\ n" +
"If (this. _ id = '000000') {emit (this. _ id, 1) ;}\ n" +
"};";

String performancefunction = "function (key, current) {" +
"Var count = 0;" +
"For (var I in current) {" +
"Count + = current [I];" +
"}" +
"Return count; \ n" +
"};";

Protected void Page_Load (object sender, EventArgs e)
{
Init ();

Var mrb = DB ["posts1"]. MapReduce (); // attach_gfstream.files
Int groupCount = 0;
Using (var mr = mrb. Map (mapfunction). Reduce (reducefunction ))
{
Foreach (Document doc in mr. Documents)
{
GroupCount = int. Parse (doc ["value"]. ToString ());
}
}
This. Mongo. Disconnect ();
}
}

The following is the running query result:

Next, we will demonstrate how to return the queried post information and load it into the list set. Here we only query two posts with the ID of 548110 and 548111:

String mapfunction = "function () {\ n" +
"If (this. _ id = '000000' | this. _ id = '000000') {emit (this, 1) ;}\ n" +
"};";

String performancefunction = "function (doc, current) {" +
"Return doc; \ n" +
"};";

Protected void Page_Load (object sender, EventArgs e)
{
Init ();

Var mrb = DB ["posts1"]. MapReduce (); // attach_gfstream.files
List <Document> postDoc = new List <Document> ();
Using (var mr = mrb. Map (mapfunction). Reduce (reducefunction ))
{
Foreach (Document doc in mr. Documents)
{
PostDoc. Add (Document) doc ["value"]);
}
}
This. Mongo. Disconnect ();
}

The following is the running query result:

The map/reduce method has many other methods. If you are interested, you can take a look at the following links:
Http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
Http://www.mongodb.org/display/DOCS/MapReduce

And the article I wrote earlier: http://www.cnblogs.com/daizhj/archive/2010/06/10/1755761.html

Of course, some temporary files will be generated when mongos performs map/reduce operations, such:

I guess these temporary files may improve the performance of the system again (but not observed at present ).

Of course, for the gridfs System of mongodb (which can be used to build a distributed file storage system, I have already introduced it in this article and I have also tested it, but unfortunately it was not successful, it often reports errors, such:

Thu Sep 09 12:09:29 Assertion failure _ grab client \ parallel. cpp 461

It seems that when the mapreduce program is linked to mongodb, there will be some problems, but I don't know if it is the cause of its own stability, or my machine environment settings (memory or 64-bit system mongos and 32-bit client connection problems ).

Well, today's article is here first.

Link: http://www.cnblogs.com/daizhj/archive/2010/09/09/1822264.html

BLOG: http://daizhj.cnblogs.com/

Author: daizhj, Dai zhenjun

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More