MongoDB MapReduce實現的group問題

來源:互聯網
上載者:User

用MapReduce寫的group速度不行啊!!

 

1)來源MongoDB權威指南

The price of using MapReduce is speed: group is not particularly speedy, but
MapReduce is slower and is not supposed to be used in “real time.” You run
MapReduce as a background job, it creates a collection of results, and then
you can query that collection in real time.

2)http://stackoverflow.com/questions/2599943/mongodbs-performance-on-aggregation-queries

The idea is that you improve the performance of aggregation queries by using MapReduce on a sharded database that is distributed over multiple machines.

I did some comparisons of the performance of Mongo's Mapreduce with a group-by-select statement in Oracle on the same machine. I did find that Mongo was approximately 25 times slower. This means that I have to shard the data over at least 25 machines to get the same performance with Mongo as Oracle delivers on a single machine. I used a collection/table with approximately 14 million documents/rows.

Exporting the data from mongo via mongoexport.exe and using the exported data as an external table in Oracle and doing a group-by in Oracle was much faster than using Mongo's own MapReduce.

3)http://blog.evilmonkeylabs.com/2011/01/27/MongoDB-1_8-MapReduce/
(下面的comments)

No clever tricks, unfortunately. As long as MapReduce is single threaded, we're not able to use it. We need to be able to run a few dozen or a few hundred at once but since you only get one MapReduce running per shard, we had to go in another direction. It would be nice to take advantage of new changes, but until it moves beyond being one giant blocking operation...

Any ideas as to when it'll be multithreaded?

----------------

I don't believe anything is scheduled currently to increase concurrency within MapReduce; much of the limitation exists in the JavaScript engine itself.

There are plans for the next major release series of MongoDB to include new aggregation features which will cover many of the common tasks people currently use MapReduce for in a much simpler interface.

-----------------
Unfortunately, MR in 1.8 will still only be single-threaded, meaning you can essentially only run one job per shard. These new features will be really useful once you can run MRs in parallel and distributed.

4)indexes for map/reduce (http://groups.google.com/group/mongodb-user/browse_thread/thread/3327e58e92140407/a16a9a2fa4b143cf?show_docid=a16a9a2fa4b143cf)

測試顯示建索引對Map/Reduce是沒有協助的!

5)一個mongodb issue ticket

http://jira.mongodb.org/browse/SERVER-1197

6)為什麼不直接用group命令呢?

 

直接存取shard server連接埠:

-------------------------

Thu Mar 17 17:14:07 uncaught exception: group command failed: {
        "errmsg" : "exception: group() can't handle more than 10000 unique keys",
        "code" : 10043,
        "ok" : 0
}

-------------------
直接存取route server連接埠:
Thu Mar 17 17:05:59 uncaught exception: group command failed: { "ok" : 0, "errmsg" : "can't do command: group on sharded collection"

}

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.