Use MongoDB as a pure memory database

Source: Internet
Author: User
Tags mongodb query

This method is extremely useful for the following applications:

Write-intensive high-speed cache placed before a slow RDBMS System

Embedded System

PCI compatible systems without persistent data

Unit testing, which requires a lightweight database and can easily clear data in the database)

If all this can be achieved, it is really elegant: We can skillfully use the MongoDB query/retrieval function without involving disk operations. You may also know that in 99% cases, disk I/O, especially random I/O, is the bottleneck of the system, and disk operations cannot be avoided if you want to write data.

MongoDB has a very cool design decision, that is, it can use the memory shadow file memory-mapped file) to process read/write requests for data in disk files. That is to say, MongoDB does not treat RAM and disk differently. It only regards files as a huge array and then accesses the data in bytes, the rest are handled by the operating system OS! This design decision allows MongoDB to run in RAM without any modifications.

Implementation Method

All of this is achieved by using a special type of file system called tmpfs. In Linux, it looks the same as the conventional File System FS, but it is completely in RAM, unless its size exceeds the RAM size, it can also perform swap, this is very useful !). My server has 32 gb ram. Let's create a 16 GB tmpfs:

 
 
  1. # mkdir /ramdata 
  2. # mount -t tmpfs -o size=16000M tmpfs /ramdata/ 
  3. # df 
  4. Filesystem           1K-blocks      Used Available Use% Mounted on 
  5. /dev/xvde1             5905712   4973924    871792  86% / 
  6. none                  15344936         0  15344936   0% /dev/shm 
  7. tmpfs                 16384000         0  16384000   0% /ramdata 

Next, use the appropriate settings to start MongoDB. To reduce the number of wasted RAM resources, set smallfiles and noprealloc to true. Since it is now based on RAM, this will not reduce performance at all. Using journal is meaningless, so set nojournal to true.

 
 
  1. dbpath=/ramdata 
  2. nojournal = true 
  3. smallFiles = true 
  4. noprealloc = true 

After MongoDB is started, you will find that it runs very well and the files in the file system appear as expected:

 
 
  1. # mongo 
  2. MongoDB shell version: 2.3.2 
  3. connecting to: test 
  4. > db.test.insert({a:1}) 
  5. > db.test.find() 
  6. { "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 } 
  7. # ls -l /ramdata/ 
  8. total 65684 
  9. -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 
  10. -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns 
  11. -rwxr-xr-x. 1 root root        5 Apr 30 15:52 mongod.lock 
  12. -rw-------. 1 root root 16777216 Apr 30 15:52 test.0 
  13. -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns 
  14. drwxr-xr-x. 2 root root       40 Apr 30 15:52 _tmp 

Now let's add some data to verify that it runs completely normally. Create a 1 kb document and add it to MongoDB for 4 million times:

 
 
  1. > str = "" 
  2. > aaa = "aaaaaaaaaa" 
  3. aaaaaaaaaa 
  4. > for (var i = 0; i < 100; ++i) { str += aaa; } 
  5. > for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});} 
  6. > db.foo.stats() 
  7.         "ns" : "test.foo", 
  8.         "count" : 4000000, 
  9.         "size" : 4544000160, 
  10.         "avgObjSize" : 1136.00004, 
  11.         "storageSize" : 5030768544, 
  12.         "numExtents" : 26, 
  13.         "nindexes" : 1, 
  14.         "lastExtentSize" : 536600560, 
  15.         "paddingFactor" : 1, 
  16.         "systemFlags" : 1, 
  17.         "userFlags" : 0, 
  18.         "totalIndexSize" : 129794000, 
  19.         "indexSizes" : { 
  20.                 "_id_" : 129794000 
  21.         }, 
  22.         "ok" : 1 

It can be seen that the average size of the document is 1136 bytes, and the data occupies a total space of 5 GB. The index size above _ id is 130 MB. Now, we need to verify that there is no duplication of data in RAM. Is there a copy of data stored in MongoDB and the file system? I still remember that MongoDB does not cache any data in her own process, and her data will only be cached in the file system cache. Let's clear the file system cache and see what data is in RAM:

 
 
  1. # echo 3 > /proc/sys/vm/drop_caches  
  2. # free 
  3.              total       used       free     shared    buffers     cached 
  4. Mem:      30689876    6292780   24397096          0       1044    5817368 
  5. -/+ buffers/cache:     474368   30215508 
  6. Swap:            0          0          0 

As you can see, in the used gb ram, GB is used for the file system cache buffer, buffer ). Why is there a GB file system cache in the system even after all the caches are cleared ?? The reason is that Linux is so smart that she does not store duplicate data in tmpfs and cache. Great! This means that you only have one copy of data in RAM. Next, let's access all the documents and verify that RAM usage does not change:

 
 
  1. > db.foo.find().itcount() 
  2. 4000000 
  3. # free 
  4.              total       used       free     shared    buffers     cached 
  5. Mem:      30689876    6327988   24361888          0       1324    5818012 
  6. -/+ buffers/cache:     508652   30181224 
  7. Swap:            0          0          0 
  8. # ls -l /ramdata/ 
  9. total 5808780 
  10. -rw-------. 1 root root  16777216 Apr 30 15:52 local.0 
  11. -rw-------. 1 root root  16777216 Apr 30 15:52 local.ns 
  12. -rwxr-xr-x. 1 root root         5 Apr 30 15:52 mongod.lock 
  13. -rw-------. 1 root root  16777216 Apr 30 16:00 test.0 
  14. -rw-------. 1 root root  33554432 Apr 30 16:00 test.1 
  15. -rw-------. 1 root root 536608768 Apr 30 16:02 test.10 
  16. -rw-------. 1 root root 536608768 Apr 30 16:03 test.11 
  17. -rw-------. 1 root root 536608768 Apr 30 16:03 test.12 
  18. -rw-------. 1 root root 536608768 Apr 30 16:04 test.13 
  19. -rw-------. 1 root root 536608768 Apr 30 16:04 test.14 
  20. -rw-------. 1 root root  67108864 Apr 30 16:00 test.2 
  21. -rw-------. 1 root root 134217728 Apr 30 16:00 test.3 
  22. -rw-------. 1 root root 268435456 Apr 30 16:00 test.4 
  23. -rw-------. 1 root root 536608768 Apr 30 16:01 test.5 
  24. -rw-------. 1 root root 536608768 Apr 30 16:01 test.6 
  25. -rw-------. 1 root root 536608768 Apr 30 16:04 test.7 
  26. -rw-------. 1 root root 536608768 Apr 30 16:03 test.8 
  27. -rw-------. 1 root root 536608768 Apr 30 16:02 test.9 
  28. -rw-------. 1 root root  16777216 Apr 30 15:52 test.ns 
  29. drwxr-xr-x. 2 root root        40 Apr 30 16:04 _tmp 
  30. # df 
  31. Filesystem           1K-blocks      Used Available Use% Mounted on 
  32. /dev/xvde1             5905712   4973960    871756  86% / 
  33. none                  15344936         0  15344936   0% /dev/shm 
  34. tmpfs                 16384000   5808780  10575220  36% /ramdata 

Sure enough! :)

Replication?

Since data in RAM is lost when the server is restarted, you may want to use replication. The standard replica set can be used to obtain the automatic failover) and improve the data read capability ). If a server restarts, it can read data from another server in the same replica set to re-synchronize and resync its data ). This process is fast enough even when a large amount of data and indexes are involved, because index operations are performed in RAM :)

It is very important that write operations write a special collection called oplog, which is located in the local database. By default, it is 5% of the total data size. In this case, oplog occupies 5% of 16 GB, that is, Mb. If you are not sure, you can use the oplogSize option to select a fixed size for oplog. If the downtime of the alternative server exceeds the oplog capacity, it must be re-synchronized. To set its size to 1 GB, you can:

OplogSize = 1000

Sharding?

Since all the query functions of MongoDB are available, how can we use it to implement a large service? You can use shards as you like to implement a large Scalable Memory Database. The disk-based solution is still used because the number of activities on these servers is small. It is not fun to re-build a cluster from scratch.

Notes

RAM is a scarce resource. In this case, you must make sure that the entire dataset can be stored in RAM. Although tmpfs has the ability to switch swapping by disk, its performance will decline significantly. To make full use of RAM, consider the following:

Use usePowerOf2Sizes to normalize storage buckets

Regularly run the compact command or resync the node)

Schema design should be quite standardized to avoid the emergence of a large number of documents)

Conclusion

Baby, now you can use MongoDB as a memory database and use all of her functions! Performance should be quite amazing: I can test it with a single thread/core, which can speed up to 20 k writes per second, the number of cores increases the write speed by multiple times.

1. Introduction to MeayunDB embedded high-speed Memory Database

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.