In this paper, we introduce the distributed storage of relational data based on MongoDB, and the storage will involve the query. Although queries can be made in a common way, today's introduction to querying using the MapReduce features provided in MongoDB.
About MongoDB's mapreduce before I wrote an article about MongoDB MapReduce first Glimpse,
Today describes how to make MapReduce queries based on the sharding mechanism. In the official document of MongoDB, this sentence:
sharded Environments
In sharded environments, data processing of map/reduce operations runs in parallel on all shards.
That is, the map/reduce operation runs in parallel on all shards.
Let's construct a mapreduce query using the environment that was built in this previous article:
The first thing to say is that the sharding-based MapReduce and non-sharding data have some differences in the return structure, and the main thing I have noticed is that there is no support for the standard JSON-formatted return data, which may be problematic in the following way:
return {count:total};
Note: The above situation now appears in my test environment, such as:
You need to change to return count;
Here is the test code, first of all by the post ID to query the corresponding number (based on the group Query instance mode):
public partial class Getfile:System.Web.UI.Page
{
Public Mongo Mongo {get; set;}
Public Imongodatabase DB
{
Get
{
return this. mongo["Dnt_mongodb"];
}
}
<summary>
Sets up the test environment. You can either override this OnInit to add custom initialization.
</summary>
public virtual void Init ()
{
String ConnectionString = "server=10.0.4.85:27017; connecttimeout=30000; connectionlifetime=300000; minimumpoolsize=512; maximumpoolsize=51200; Pooled=true ";
if (String.IsNullOrEmpty (ConnectionString))
throw new ArgumentNullException ("Connection string not found.");
This. Mongo = new Mongo (ConnectionString);
This. Mongo.connect ();
}
String mapfunction = "function () {\ n" +
"If (this._id== ' 548111 ') {Emit (this._id, 1);} \ n "+
"};";
String reducefunction = "function (key, current) {" +
"var count = 0;" +
"For (var i in current) {" +
"Count+=current[i];" +
" }" +
"return count; \ n" +
"};";
protected void Page_Load (object sender, EventArgs e)
{
Init ();
var MRB = db["Posts1"]. MapReduce ();//attach_gfstream.files
int groupcount = 0;
using (var Mr = MRB. Map (mapfunction). Reduce (reducefunction))
{
foreach (Document doc in Mr. Documents)
{
GroupCount = Int. Parse (doc["value"]. ToString ());
}
}
This. Mongo.disconnect ();
}
}
The following is the results of the query at runtime, as follows:
Next, we'll show you how to return and load the query to the list collection, where only the ID 548110 and 5,481,112 posts are queried:
String mapfunction = "function () {\ n" +
"If (this._id== ' 548110 ' | | this._id== ' 548111 ') {emit (this, 1);} \ n "+
"};";
String reducefunction = "function (doc, current) {" +
"Return doc;\n" +
"};";
protected void Page_Load (object sender, EventArgs e)
{
Init ();
var MRB = db["Posts1"]. MapReduce ();//attach_gfstream.files
list<document> postdoc = new list<document> ();
using (var Mr = MRB. Map (mapfunction). Reduce (reducefunction))
{
foreach (Document doc in Mr. Documents)
{
Postdoc.add (Document) doc["value"]);
}
}
This. Mongo.disconnect ();
}
The following is the results of the query at runtime, as follows:
The above Map/reduce method has many ways to write, if you are interested can look at the following links:
http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
Http://www.mongodb.org/display/DOCS/MapReduce
And before I wrote this article: http://www.cnblogs.com/daizhj/archive/2010/06/10/1755761.html
Of course, when MONGOs perform map/reduce operations, some temporary files are generated, such as:
I guess these temporary files may have some performance gains (but not currently observed) when querying the system again.
Of course for MongoDB's GRIDFS system (which can be used to build a distributed file storage system, which I have already described in this article, I have also done the testing, but unfortunately not successful, it often reported some errors, such as:
Thu SEP 12:09:29 Assertion failure _grab client\parallel.cpp 461
It seems that when the MapReduce program is linked to MongoDB, there are some problems, but I do not know whether it is the reason for its stability, or my machine environment setting problem (memory or configured 64-bit system MONGOs with 32-bit client connection problem).
Well, today's article will be here first.
MapReduce parallel query based on MONGODB distributed storage