The speed of group writing with mapreduce is not good !!
1) Source MongoDB authoritative guide
The price of using mapreduce is speed: group is not special speedy,
Mapreduce is slower and is not supposed to be used in "real time." You Run
Mapreduce as a background job, it creates a collection of results, and then
You can query that collection in real time.
2) http://stackoverflow.com/questions/2599943/mongodbs-performance-on-aggregation-queries
The idea is that you improve the performance of aggregation queries by using mapreduce on a sharded database that is distributed over multiple machines.
I did some comparisons of the performance of Mongo's mapreduce with a group-by-select statement in Oracle on the same machine. I did find that Mongo was approximately 25 times slower. this means that I have to shard the data over at least 25 machines to get the same performance with Mongo as Oracle delivers on a single machine. I used a collection/table with approximately 14 million documents/rows.
Exporting the data from Mongo via external export.exe and using the exported data as an external table in Oracle and doing a group-by in Oracle was much faster than using Mongo's own mapreduce.
3) http://blog.evilmonkeylabs.com/2011/01/27/MongoDB-1_8-MapReduce/
(Comments below)
No clever tricks, unfortunately. as long as mapreduce is single threaded, we're not able to use it. we need to be able to run a few dozen or a few hundred at once but since you only get one mapreduce running per Shard, we had to go in another direction. it wocould be nice to take advantage of new changes, but until it moves beyond being one giant blocking operation...
Any ideas as to when it'll be multithreaded?
----------------
I don't believe anything is scheduled currently to increase concurrency within mapreduce; much of the limitation exists in the JavaScript engine itself.
There are plans for the next major release series of MongoDB to include new aggregation features which will cover the Common Tasks people currently use mapreduce for in a much simpler interface.
-----------------
Unfortunately, Mr in 1.8 will still only be single-threaded, meaning you can essential only run one job per shard. these new features will be really useful once you can run MRS in parallel and distributed.
4) indexes for MAP/reduce (http://groups.google.com/group/mongodb-user/browse_thread/thread/3327e58e92140407/a16a9a2fa4b143cf? Show_docid = a16a9a2fa4b143cf)
The test shows that index creation is not helpful for MAP/reduce!
5) One MongoDB issue ticket
Http://jira.mongodb.org/browse/SERVER-1197
6) Why not use the group command directly?
Access the shard server port directly:
-------------------------
Thu Mar 17 17:14:07 uncaught exception: Group Command failed :{
"Errmsg": "exception: group () can't handle more than 10000 unique keys ",
"Code": 10043,
"OK": 0
}
-------------------
Directly access the route server port:
Thu Mar 17 17:05:59 uncaught exception: Group Command failed: {"OK": 0, "errmsg": "Can't do command: Group on sharded Collection"
}