How to use MongoDB as a pure memory database

Source: Internet
Author: User
Keywords Can cache server we case

This usage is very useful for the following applications:

Write-intensive caching before a slow RDBMS system

Embedded system

PCI compatible systems without persistent data

Unit tests that require a lightweight database and that data in the library can be easily cleared away (testing)

If all this could be done, it would be elegant: we would be able to skillfully exploit the http://www.aliyun.com/zixun/aggregation/13461.html ">mongodb query/retrieval function without involving disk operations." As you may know, in 99% of the cases, disk IO (especially random io) is the bottleneck of the system, and disk operations are unavoidable if you want to write data.

MongoDB has a very cool design decision that she can use a memory innuendo file (memory-mapped file) to process read and write requests for data in a disk file. This means that MongoDB does not discriminate between RAM and disk, but treats the file as a huge array and then accesses the data in bytes, leaving the operating system (OS) to handle it! This is the design decision that allows MongoDB to run in RAM without any modification.

Implementation methods

All this is done by using a special type of file system called TMPFS. In Linux it looks like a regular file system (FS), but it's completely in RAM (unless it's larger than RAM, and it can be swap at this point, which is useful!). )。 My server has 32GB of RAM, let's create a 16GB TMPFS:

Next, start the MongoDB with the appropriate settings. To reduce the amount of wasted RAM, you should set Smallfiles and Noprealloc to True. Since it is based on RAM, doing so does not degrade performance at all. It makes no sense to use journal at this point, so you should set the nojournal to true.

After MongoDB starts, you will find that she is running very well and the files in the file system appear as expected:

Now let's add some data to verify that it's working perfectly. We first create a 1KB document and then add it to MongoDB 4 million times:

As you can see, the average document size is 1136 bytes, and the data occupies a total of 5GB of space. The index size above the _id is 130MB. Now we need to verify a very important thing: does the data in RAM be duplicated, and is it stored in the MongoDB and file system? Remember that MongoDB does not cache any data within her own process, and her data is cached only in the file system. Let's clear the file system cache and see what else is in RAM:

As you can see, in the 6.3GB of RAM used, 5.8GB is used for file system caching (buffers, buffer). Why is there still 5.8GB of file system cache in the system even after all caches have been cleared?? The reason is that Linux is so smart that she doesn't keep duplicate data in TMPFS and cache. That's great! This means that you have only one piece of data in RAM. Let's take a look at all the document and verify that RAM usage does not change:

Sure! :)

Copy (replication)?

Now that the data in RAM is lost when the server restarts, you may want to use replication. Automatic failover (failover) can be achieved with a standard replica set (replica set), as well as improved data reading (read capacity). If a server reboots, it can read data from another server in the same replica set to reconstruct its own data (Resync, Resync). Even in the case of large amounts of data and indexes, this process can be fast enough, because indexing is done in ram:)

It is important to write a special collection called Oplog, which is located in the local database. By default, its size is 5% of the total amount of data. In my case, Oplog will occupy 16GB of 5%, or 800MB space. In doubt, it is safer to choose a fixed size for oplog using the oplogsize option. If the alternate server goes down longer than the oplog capacity, it must be synchronized. To set its size to 1GB, you can do this:

oplogsize = 1000

What about fragmentation (sharding)?

Now that you have all the query capabilities of MongoDB, how do you use it to achieve a large service? You can use fragmentation to implement a large, scalable memory database. Configure the server (save block allocations) or use a disk-based scheme, because the number of activities in these servers is small, it is not fun to rebuild the cluster from scratch.

Precautions

RAM is a scarce resource, and in this case you want to have the entire dataset in RAM. Although TMPFS has the ability to use disk swapping (swapping), its performance degradation will be significant. To make the most of your RAM, you should consider:

Normalize storage bucket using the usepowerof2sizes option

Run the compact command periodically or resynchronize the node (resync)

Schemas are designed to be fairly normalized (to avoid large amounts of document)

Conclusion

Baby, you can now use the MongoDB as a memory database, and you can have all of her features! Performance, it should be quite amazing: I test in single thread/core, can reach the speed of 20K write per second, and increase the number of cores will increase the number of times the write speed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.