Use MongoDB as a pure memory database

Last Update:2014-04-10 Source: Internet

Author: User

Tags mongodb query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This method is extremely useful for the following applications:

Write-intensive high-speed cache placed before a slow RDBMS System

Embedded System

PCI compatible systems without persistent data

Unit testing, which requires a lightweight database and can easily clear data in the database)

If all this can be achieved, it is really elegant: We can skillfully use the MongoDB query/retrieval function without involving disk operations. You may also know that in 99% cases, disk I/O, especially random I/O, is the bottleneck of the system, and disk operations cannot be avoided if you want to write data.

MongoDB has a very cool design decision, that is, it can use the memory shadow file memory-mapped file) to process read/write requests for data in disk files. That is to say, MongoDB does not treat RAM and disk differently. It only regards files as a huge array and then accesses the data in bytes, the rest are handled by the operating system OS! This design decision allows MongoDB to run in RAM without any modifications.

Implementation Method

All of this is achieved by using a special type of file system called tmpfs. In Linux, it looks the same as the conventional File System FS, but it is completely in RAM, unless its size exceeds the RAM size, it can also perform swap, this is very useful !). My server has 32 gb ram. Let's create a 16 GB tmpfs:

 
 
  
  # mkdir /ramdata 
  
  # mount -t tmpfs -o size=16000M tmpfs /ramdata/ 
  
  # df 
  
  Filesystem           1K-blocks      Used Available Use% Mounted on 
  
  /dev/xvde1             5905712   4973924    871792  86% / 
  
  none                  15344936         0  15344936   0% /dev/shm 
  
  tmpfs                 16384000         0  16384000   0% /ramdata

Next, use the appropriate settings to start MongoDB. To reduce the number of wasted RAM resources, set smallfiles and noprealloc to true. Since it is now based on RAM, this will not reduce performance at all. Using journal is meaningless, so set nojournal to true.

 
 
  
  dbpath=/ramdata 
  
  nojournal = true 
  
  smallFiles = true 
  
  noprealloc = true

After MongoDB is started, you will find that it runs very well and the files in the file system appear as expected:

 
 
  
  # mongo 
  
  MongoDB shell version: 2.3.2 
  
  connecting to: test 
  
  > db.test.insert({a:1}) 
  
  > db.test.find() 
  
  { "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 } 
  
  # ls -l /ramdata/ 
  
  total 65684 
  
  -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 
  
  -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns 
  
  -rwxr-xr-x. 1 root root        5 Apr 30 15:52 mongod.lock 
  
  -rw-------. 1 root root 16777216 Apr 30 15:52 test.0 
  
  -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns 
  
  drwxr-xr-x. 2 root root       40 Apr 30 15:52 _tmp

Now let's add some data to verify that it runs completely normally. Create a 1 kb document and add it to MongoDB for 4 million times:

 
 
  
  > str = "" 
  
  > aaa = "aaaaaaaaaa" 
  
  aaaaaaaaaa 
  
  > for (var i = 0; i < 100; ++i) { str += aaa; } 
  
  > for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});} 
  
  > db.foo.stats() 
  
  { 
  
          "ns" : "test.foo", 
  
          "count" : 4000000, 
  
          "size" : 4544000160, 
  
          "avgObjSize" : 1136.00004, 
  
          "storageSize" : 5030768544, 
  
          "numExtents" : 26, 
  
          "nindexes" : 1, 
  
          "lastExtentSize" : 536600560, 
  
          "paddingFactor" : 1, 
  
          "systemFlags" : 1, 
  
          "userFlags" : 0, 
  
          "totalIndexSize" : 129794000, 
  
          "indexSizes" : { 
  
                  "_id_" : 129794000 
  
          }, 
  
          "ok" : 1 
  
  }

It can be seen that the average size of the document is 1136 bytes, and the data occupies a total space of 5 GB. The index size above _ id is 130 MB. Now, we need to verify that there is no duplication of data in RAM. Is there a copy of data stored in MongoDB and the file system? I still remember that MongoDB does not cache any data in her own process, and her data will only be cached in the file system cache. Let's clear the file system cache and see what data is in RAM:

 
 
  
  # echo 3 > /proc/sys/vm/drop_caches  
  
  # free 
  
               total       used       free     shared    buffers     cached 
  
  Mem:      30689876    6292780   24397096          0       1044    5817368 
  
  -/+ buffers/cache:     474368   30215508 
  
  Swap:            0          0          0

As you can see, in the used gb ram, GB is used for the file system cache buffer, buffer ). Why is there a GB file system cache in the system even after all the caches are cleared ?? The reason is that Linux is so smart that she does not store duplicate data in tmpfs and cache. Great! This means that you only have one copy of data in RAM. Next, let's access all the documents and verify that RAM usage does not change:

 
 
  
  > db.foo.find().itcount() 
  
  4000000 
  
  # free 
  
               total       used       free     shared    buffers     cached 
  
  Mem:      30689876    6327988   24361888          0       1324    5818012 
  
  -/+ buffers/cache:     508652   30181224 
  
  Swap:            0          0          0 
  
  # ls -l /ramdata/ 
  
  total 5808780 
  
  -rw-------. 1 root root  16777216 Apr 30 15:52 local.0 
  
  -rw-------. 1 root root  16777216 Apr 30 15:52 local.ns 
  
  -rwxr-xr-x. 1 root root         5 Apr 30 15:52 mongod.lock 
  
  -rw-------. 1 root root  16777216 Apr 30 16:00 test.0 
  
  -rw-------. 1 root root  33554432 Apr 30 16:00 test.1 
  
  -rw-------. 1 root root 536608768 Apr 30 16:02 test.10 
  
  -rw-------. 1 root root 536608768 Apr 30 16:03 test.11 
  
  -rw-------. 1 root root 536608768 Apr 30 16:03 test.12 
  
  -rw-------. 1 root root 536608768 Apr 30 16:04 test.13 
  
  -rw-------. 1 root root 536608768 Apr 30 16:04 test.14 
  
  -rw-------. 1 root root  67108864 Apr 30 16:00 test.2 
  
  -rw-------. 1 root root 134217728 Apr 30 16:00 test.3 
  
  -rw-------. 1 root root 268435456 Apr 30 16:00 test.4 
  
  -rw-------. 1 root root 536608768 Apr 30 16:01 test.5 
  
  -rw-------. 1 root root 536608768 Apr 30 16:01 test.6 
  
  -rw-------. 1 root root 536608768 Apr 30 16:04 test.7 
  
  -rw-------. 1 root root 536608768 Apr 30 16:03 test.8 
  
  -rw-------. 1 root root 536608768 Apr 30 16:02 test.9 
  
  -rw-------. 1 root root  16777216 Apr 30 15:52 test.ns 
  
  drwxr-xr-x. 2 root root        40 Apr 30 16:04 _tmp 
  
  # df 
  
  Filesystem           1K-blocks      Used Available Use% Mounted on 
  
  /dev/xvde1             5905712   4973960    871756  86% / 
  
  none                  15344936         0  15344936   0% /dev/shm 
  
  tmpfs                 16384000   5808780  10575220  36% /ramdata

Sure enough! :)

Replication?

Since data in RAM is lost when the server is restarted, you may want to use replication. The standard replica set can be used to obtain the automatic failover) and improve the data read capability ). If a server restarts, it can read data from another server in the same replica set to re-synchronize and resync its data ). This process is fast enough even when a large amount of data and indexes are involved, because index operations are performed in RAM :)

It is very important that write operations write a special collection called oplog, which is located in the local database. By default, it is 5% of the total data size. In this case, oplog occupies 5% of 16 GB, that is, Mb. If you are not sure, you can use the oplogSize option to select a fixed size for oplog. If the downtime of the alternative server exceeds the oplog capacity, it must be re-synchronized. To set its size to 1 GB, you can:

OplogSize = 1000

Sharding?

Since all the query functions of MongoDB are available, how can we use it to implement a large service? You can use shards as you like to implement a large Scalable Memory Database. The disk-based solution is still used because the number of activities on these servers is small. It is not fun to re-build a cluster from scratch.

Notes

RAM is a scarce resource. In this case, you must make sure that the entire dataset can be stored in RAM. Although tmpfs has the ability to switch swapping by disk, its performance will decline significantly. To make full use of RAM, consider the following:

Use usePowerOf2Sizes to normalize storage buckets

Regularly run the compact command or resync the node)

Schema design should be quite standardized to avoid the emergence of a large number of documents)

Conclusion

Baby, now you can use MongoDB as a memory database and use all of her functions! Performance should be quite amazing: I can test it with a single thread/core, which can speed up to 20 k writes per second, the number of cores increases the write speed by multiple times.

1. Introduction to MeayunDB embedded high-speed Memory Database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More