MapReduce parallel query based on MongoDB distributed storage

Source: Internet
Author: User
Tags emit

The previous article introduced how to conduct distributed storage of Relational Data Based on Mongodb. With storage, queries will be involved. Although it can be queried in a common way, we will introduce how to use the MapReduce function provided in MONGODB for query today.
I have written an article about MongoDb MapReduce before,

Today we will introduce how to perform mapreduce query based on the sharding mechanism. In the official documents of MongoDB, the following sentence is used:

Sharded Environments
In sharded environments, data processing of map/reduce operations runs in parallel on all shards.


That is, the map/reduce operation runs on all shards in parallel.
The following describes how to construct a mapreduce query using the environment set up in the previous article:

First of all, the sharding-based mapreduce and non-sharding data have some differences in the return structure. Currently, I have noticed that the custom json format is not supported for returned data, that is, the following method may cause problems:

Return {count: total };


Note: The above situation is currently found in my test environment, for example:



You need to change it to return count;

The following is the test code. First, query the corresponding quantity by post id (query instances by Group ):

Public partial class getfile: System. Web. UI. Page
{

Public Mongo {get; set ;}

Public IMongoDatabase DB
{
Get
{
Return this. Mongo ["dnt_mongodb"];
}
}

/// <Summary>
/// Sets up the test environment. You can either override this OnInit to add initim initialization.
/// </Summary>
Public virtual void Init ()
{
String ConnectionString = "Server = 10.0.4.85: 27017; ConnectTimeout = 30000; ConnectionLifetime = 300000; MinimumPoolSize = 512; MaximumPoolSize = 51200; Pooled = true ";
If (String. IsNullOrEmpty (ConnectionString ))
Throw new ArgumentNullException ("Connection string not found .");
This. Mongo = new Mongo (ConnectionString );
This. Mongo. Connect ();
}
String mapfunction = "function () {\ n" +
"If (this. _ id = '000000') {emit (this. _ id, 1) ;}\ n" +
"};";

String performancefunction = "function (key, current) {" +
"Var count = 0;" +
"For (var I in current) {" +
"Count + = current [I];" +
"}" +
"Return count; \ n" +
"};";


Protected void Page_Load (object sender, EventArgs e)
{
Init ();

Var mrb = DB ["posts1"]. MapReduce (); // attach_gfstream.files
Int groupCount = 0;
Using (var mr = mrb. Map (mapfunction). Reduce (reducefunction ))
{
Foreach (Document doc in mr. Documents)
{
GroupCount = int. Parse (doc ["value"]. ToString ());
}
}
This. Mongo. Disconnect ();
}
}

 


The following is the running query result:




Next, we will demonstrate how to return the queried post information and load it into the list set. Here we only query two posts with the ID of 548110 and 548111:

String mapfunction = "function () {\ n" +
"If (this. _ id = '000000' | this. _ id = '000000') {emit (this, 1) ;}\ n" +
"};";

String performancefunction = "function (doc, current) {" +
"Return doc; \ n" +
"};";

Protected void Page_Load (object sender, EventArgs e)
{
Init ();

Var mrb = DB ["posts1"]. MapReduce (); // attach_gfstream.files
List <Document> postDoc = new List <Document> ();
Using (var mr = mrb. Map (mapfunction). Reduce (reducefunction ))
{
Foreach (Document doc in mr. Documents)
{
PostDoc. Add (Document) doc ["value"]);
}
}
This. Mongo. Disconnect ();
}

 


The following is the running query result:


The map/reduce method has many other methods. If you are interested, you can take a look at the following links:
Http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
Http://www.mongodb.org/display/DOCS/MapReduce

And the article I wrote earlier: http://www.cnblogs.com/daizhj/archive/2010/06/10/1755761.html


Of course, some temporary files will be generated when mongos performs map/reduce operations, such:


I guess these temporary files may improve the performance of the system again (but not observed at present ).

Of course, for the gridfs System of mongodb (which can be used to build a distributed file storage system, I have already introduced it in this article and I have also tested it, but unfortunately it was not successful, it often reports errors, such:

 

 

Thu Sep 09 12:09:29 Assertion failure _ grab client \ parallel. cpp 461

 


It seems that when the mapreduce program is linked to mongodb, there will be some problems, but I don't know if it is the cause of its own stability, or my machine environment settings (memory or 64-bit system mongos and 32-bit client connection problems ).

Well, today's article is here first.

Link: http://www.cnblogs.com/daizhj/archive/2010/09/09/1822264.html

BLOG: http://daizhj.cnblogs.com/

Author: daizhj, Dai zhenjun

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.