MongoDB now has two storage engines MMAPv1 and Wiretiger, of course, in addition to these two storage engines there are other storage engines.
Such as:
- Memory Engine: Now the MongoDB version already has, the main cache service, it is mainly to do unit testing.
- Mongo-rocks: is a key-value engine that is used as a hybrid layer for Facebook's rocksdb
- Fusion-io: This storage engine is created by SanDisk and he is as far as possible bypassing the operating system's file layer to write directly to the storage device.
- TOKUMX: This storage system was created by Percona, using a fractal tree index instead of the B-tree tree index.
- /dev/null: This storage engine returns empty results for everything you write and read, which sounds foolish, but in some cases it is useful, for example, to find some performance bottlenecks in your application that are not related to the database, for instance.
MMAPv1
MMAPV1 is named because the command Mmap () in Linux means mapping files to virtual memory and allowing a single optimization of some use cases, for example, when you have a large file, but you do not need to read the entire file, you only need to read part of the reader, Mmap () is much faster than a read (), because read () reads the entire file into memory.
MMAP1 has a collection level lock, but does not have a document level lock, which makes it impossible to have two simultaneous processes write to the same collection. Therefore, for the same collection write operation, you must wait for the previous operation to complete before the next write operation. Mmap This collection level of locking is necessary because the index of MMAP involves multiple document, and if these indexes cannot be updated at the same time, then these indexes will be unstable.
Wiredtiger
Mmap uses the B-tree tree to store the index, Wiredtiger also uses b-trees, but supports the LSM tree image above is adapted from here).
The LSM tree is advantageous in situations where there is a need to go to a workload with a large number of random insertions, when your data is larger than the capacity of the CAHCE and the maintenance of the backend is in an acceptable range.
In the Wiredtiger engine, if the document of an element needs to be updated, a whole new document will be written to disk and the old document removed.
Wiredtiger provides document-level-concurrency, which means that two write operations will not affect the same document, and if affected, an operation will be returned for re-execution. If the return execution is very little then this is a very good performance optimization.
There is also a wiredtiger unique, providing a function of compressing data and indexes in the file system, he supports fast pressure and zlib two kinds of operations, by default it is useful fast pressure, and zlib compared to he used a small amount of CPU but it has low compression efficiency.
Benchmarks
When Wiredtiger to MongoDB service, he was published in the writing operations, performance will be 7-10 times faster than before, and compressed 80% of the file system, which is a big improvement, the following is the multi-threaded throughput
Conclusion
If your app is read with great weight, use Mmap, if written large, use Wiredtiger.
An interesting problem is that you can create a copy set of a hybrid engine. In the replication set, you can configure Wiredtiger to accept a large amount of write data load, and then use another node to configure the Mmap engine to be used by some read data services, to replicate the data between the Assembly automatic main library and other libraries, their underlying storage engine is independent.
If your data files are created using the Mmap engine, you will need to create a new database, and if you need to run with a Wiredtirger node, he cannot open the data file, which reverses the same, although they use different methods to store the data, you cannot reuse the same file , but in a replication set, it's no problem to exchange data with the main library
[MongoDB] Comparison of MMAP and Wiredtiger