MapReduce parallel query based on MONGODB distributed storage

Source: Internet
Author: User
Tags emit

In this paper, we introduce the distributed storage of relational data based on MongoDB, and the storage will involve the query. Although queries can be made in a common way, today's introduction to querying using the MapReduce features provided in MongoDB.
About MongoDB's mapreduce before I wrote an article about MongoDB MapReduce first Glimpse,

Today describes how to make MapReduce queries based on the sharding mechanism. In the official document of MongoDB, this sentence:

sharded Environments
In sharded environments, data processing of map/reduce operations runs in parallel on all shards.


That is, the map/reduce operation runs in parallel on all shards.
Let's construct a mapreduce query using the environment that was built in this previous article:

The first thing to say is that the sharding-based MapReduce and non-sharding data have some differences in the return structure, and the main thing I have noticed is that there is no support for the standard JSON-formatted return data, which may be problematic in the following way:

return {count:total};


Note: The above situation now appears in my test environment, such as:



You need to change to return count;

Here is the test code, first of all by the post ID to query the corresponding number (based on the group Query instance mode):

public partial class Getfile:System.Web.UI.Page
{

Public Mongo Mongo {get; set;}


Public Imongodatabase DB
{
Get
{
return this. mongo["Dnt_mongodb"];
}
}

<summary>
Sets up the test environment. You can either override this OnInit to add custom initialization.
</summary>
public virtual void Init ()
{
String ConnectionString = "server=10.0.4.85:27017; connecttimeout=30000; connectionlifetime=300000; minimumpoolsize=512; maximumpoolsize=51200; Pooled=true ";
if (String.IsNullOrEmpty (ConnectionString))
throw new ArgumentNullException ("Connection string not found.");
This. Mongo = new Mongo (ConnectionString);
This. Mongo.connect ();
}
String mapfunction = "function () {\ n" +
"If (this._id== ' 548111 ') {Emit (this._id, 1);} \ n "+
"};";

String reducefunction = "function (key, current) {" +
"var count = 0;" +
"For (var i in current) {" +
"Count+=current[i];" +
"   }" +
"return count; \ n" +
"};";


protected void Page_Load (object sender, EventArgs e)
{
Init ();

var MRB = db["Posts1"]. MapReduce ();//attach_gfstream.files
int groupcount = 0;
using (var Mr = MRB. Map (mapfunction). Reduce (reducefunction))
{
foreach (Document doc in Mr. Documents)
{
GroupCount = Int. Parse (doc["value"]. ToString ());
}
}
This. Mongo.disconnect ();
}
}


The following is the results of the query at runtime, as follows:




Next, we'll show you how to return and load the query to the list collection, where only the ID 548110 and 5,481,112 posts are queried:

String mapfunction = "function () {\ n" +
"If (this._id== ' 548110 ' | | this._id== ' 548111 ') {emit (this, 1);} \ n "+
"};";

String reducefunction = "function (doc, current) {" +
"Return doc;\n" +
"};";

protected void Page_Load (object sender, EventArgs e)
{
Init ();

var MRB = db["Posts1"]. MapReduce ();//attach_gfstream.files
list<document> postdoc = new list<document> ();
using (var Mr = MRB. Map (mapfunction). Reduce (reducefunction))
{
foreach (Document doc in Mr. Documents)
{
Postdoc.add (Document) doc["value"]);
}
}
This. Mongo.disconnect ();
}


The following is the results of the query at runtime, as follows:


The above Map/reduce method has many ways to write, if you are interested can look at the following links:
http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
Http://www.mongodb.org/display/DOCS/MapReduce

And before I wrote this article: http://www.cnblogs.com/daizhj/archive/2010/06/10/1755761.html


Of course, when MONGOs perform map/reduce operations, some temporary files are generated, such as:


I guess these temporary files may have some performance gains (but not currently observed) when querying the system again.

Of course for MongoDB's GRIDFS system (which can be used to build a distributed file storage system, which I have already described in this article, I have also done the testing, but unfortunately not successful, it often reported some errors, such as:

Thu SEP 12:09:29 Assertion failure _grab client\parallel.cpp 461


It seems that when the MapReduce program is linked to MongoDB, there are some problems, but I do not know whether it is the reason for its stability, or my machine environment setting problem (memory or configured 64-bit system MONGOs with 32-bit client connection problem).

Well, today's article will be here first.

MapReduce parallel query based on MONGODB distributed storage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.