MongoDB source code overview-memory management and storage engine MongoDB source code overview-logs

Last Update:2018-12-07 Source: Internet

Author: User

Tags mongodb server mongodb windows

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data storage:

When I introduced journal, I mentioned why MongoDB puts data into the memory instead of directly storing the database storage files. This is related to MongoDB's storage and management operations on database recording files. MongoDB uses the memory file ing (MMAP) provided by the underlying operating system to access database record files. MMAP can directly map all the disk files to the memory space of the process, in this way, each data record in the file will have a corresponding address in the memory. In this case, the read and write operations on the file can be done directly through the memory operation (instead of fread, fwrite generation ).

By the way, MMAP only maps files to the process space, rather than directly maps all the files to the physical memory, only when this data block is accessed will the operating system switch to the physical memory in page mode. This part of management is completed by the operating system, which is transparent to MongoDB developers. in fact, all the functions we can use, including the implementation functions in the system kernel, all the operations are virtual memory, that is, each process's so-called 4 GB (32-bit System). the physical memory is invisible to users and cannot be operated. This is why MongoDB can store data larger than the memory, but it is not recommended that the hot data exceed the memory size. Because the hot data is larger than the memory, the operating system needs to frequently swap in and out the data in the physical memory, which seriously affects the performance of MongoDB.

Virtual Memory table of 32-bit operating system processes:

Using this memory management method greatly reduces the burden on MongoDB developers and transfers a large amount of memory management work to the operating system, at the time of writing this article, I summarized her characteristics, but later I found that there was a summary in this book, so I pasted it directly (with a few underscores). No way, people are better than me.

• MongoDB's code for managing memory is small and clean, because most of that work is pushed to the operating system.

• The virtual size of a MongoDB server process is often very large, exceeding the size of the entire data set. this is OK, because the operating system will handle keeping the amount of data resident in memory contained.

• MongoDB cannot control the order that data is written to disk, which makes it

Impossible to use a writeahead log to provide single-server durability. Work is

Ongoing on an alternative storage engine for MongoDB to provide single-server

Durability.

• 32-bit MongoDB servers are limited to a total of about 2 GB of data per week D.

This is because all of the data must be addressable using only 32 bits.

(If you want to learn more about MMAP, read section 12.2 of UNIX Network Programming Volume 2)

Now, the abstract is finished. Here is some hard work !!!

Storage source code analysis:

In the definition of the mongommf class (momgommf. h 29), pay attention to the following methods:

 Void * map (const char * filename, unsigned long & length, int Options = 0); // map the file filename to the process space (called a view) in MMAP mode ), return the first address in the memory // if the file does not exist, the void flush (bool sync) file will be created through createfile in mmap_win ); // flush the data mapped to the process space to the disk void * getview () const // obtain the first view Address

For the internal implementation of these three methods, we can naturally think of calling the API of the operating system, and changing the method signature and parameters of different operating systems, here I won't be so arrogant, and the APIs of various systems can be found. Therefore, we will not post the system API called internally here.

When did MongoDB map database files to the memory? When will the data mapped in the memory be flushed to the disk for persistence? Let's analyze these two problems.

Map database files to memory:

When we insert a record into an uncreated database for the first time, the called function will follow the process below:

 Datafilemgr: insert () --> Database: allocextent () --> Database: suitablefile () --> Database: GetFile () --> begin datafile: open () -- "begin MMF: Create ()

I have omitted some methods before datafilemgr: insert (). This call process is long, but it will eventually be called to create the first database file.

 Bool extends MMF: Create (string fname, unsigned long & Len, bool sequentialhint) {setpath (fname); _ view_write = map (fname. c_str (), Len, sequentialhint? Sequential: 0); // if the file does not exist, the file will be created through createfile in mmap_win, memorymappedfile: Map Method return finishopening ();}

ObservationCodeThen we found that the create method directly calls map, and the internal map has the file creation function. After the creation, the map will be stored in the memory.

If a record is inserted into an existing database, openallfiles () is called during database construction to enter the database: GetFile () section of the above process.

In both cases, we understand when MongoDB maps database record files to the memory.

Flush data for persistence:

By default, apsaradb for MongoDB uses flush once per minute for persistent storage. Of course, you can set this interval through the "-- syncdelay" Startup parameter. The execution process is main () --> datafilesync. Go (). Datafilesync is derived from backgroundjob. Its go () method creates a new thread to run the virtual function run ().

Void run () {If (cmdline. syncdelay = 0) log () <"Warning: -- syncdelay 0 is not recommended and can have strange performance" <Endl; else if (unbounded line. syncdelay = 1) log () <"-- syncdelay 1" <Endl; else if (using line. syncdelay! = 60) // The default value is 60 log (1) <"-- syncdelay" <strong line. syncdelay <Endl; int time_flushing = 0; while (! Inshutdown () {flushdiaglog (); If (Response line. syncdelay = 0) {// In case at some point we add an option to change at runtime sleepsecs (5); continue;} sleepmillis (long) STD :: max (0.0, (linear line. syncdelay * 1000)-time_flushing); If (inshutdown () {// occasional issue trying to flush during shutdown when sleep interrupted break;} date_t start = jstime (); // The current datafilesync task is after a period of time. syncdelay) flush the data in the memory to the disk (because MongoDB uses the MMAP method to put the data in the memory first) int numfiles = memorymappedfile: flushall (true); time_flushing = (INT) (jstime ()-Start); globalflushcounters. flushed (time_flushing); log (1) <"flushing MMAP took" <time_flushing <"Ms" <"for" <numfiles <"Files" <Endl ;}}

Run () finally calls the memorymappedfile: flushall method to flush all the ing files, and persists the changes to the disk. this method has been introduced before when introducing merge MMF. I will not describe it here.

Here, by the way, in fact, MMAP does not call fsync to forcibly fl the disk, and the operating system will also help us automatically fl the disk, in Linux, there is a dirty_writeback_centisecs parameter used to define the time when dirty data stays in the memory (the default value is 500, that is, 5 seconds). After this timeout time, the system will fl the data to the disk. All Io operations will be blocked during the automatic refreshing process. If a large amount of data needs to be refreshed, some time-consuming operations may be generated. For example, some operations that use MMAPProgramThere will be timeout operations at intervals. The general optimization method is to modify the system parameter dirty_writeback_centisecs to speed up the dirty page flushing frequency to reduce the duration. MongoDB is a regular and strong brush, so this problem does not occur.

Problem:

I figured out when the MongoDB storage engine will map the database record files to the memory space of the process and when it will be flushed to the original file. I wonder if you have found the problem? The persistent flush process is called every minute, and Data Writing is performed all the time. If there is not one minute, what should I do if the server loses power in 59 seconds? Are all database operations not committed to persistent database files within 59 seconds? 59 seconds of data loss is not the most terrible. if the system goes down 60 seconds later during the flushall process, data files may be disordered, some of which are new data and some are old data. In this case, it is possible that our database cannot be used.

I don't know why, MongoDB is in the correct exit process (calling dbexit (exit_clean), and does not call memorymappedfile: flushall for persistent operations if it is not started in -- dur mode, this makes me very puzzled. at first, I thought that the code of my version was not complete. I immediately checked the source code of Version 2.2 and found that the flush method was not called in non-"-- dur. Only call memorymappedfile: closeallfiles.

In my personal understanding, "-- dur" will be enabled in the production environment, and even in the new version, it will be enabled by default in the 64-bit runtime environment, therefore, it is not necessary to flush a non-dur mode.

If you are using the MongoDB Windows version for debugging to verify the above description, you will get the opposite result, maybe your first feeling is that I am totally wrong. Indeed, the average person thinks that we will conduct a simple test process:

Start mongod in non-"-- dur" mode. It is best to adjust -- syncdelay during startup to set a larger value such as 600.
Use Mogo to modify database data (such as modification and deletion)
Use the task manager to force terminate the process mongod (simulate system downtime)
Delete mongod. Lock (this will be left when the simulation goes down) and restart mongod in non-"-- dur" mode.
Use Mongo for db. collectiob. Find () to check whether the first change has taken effect.

With the above test process, you will be surprised to find that any of our changes have been made persistently. Does this mean that what I mentioned above is all nonsense? At first, I had a little doubt about this result. I tested it many times and conducted tracking debugging. I found that even if MongoDB never ran flushall once, in addition, the flush () method is not called even for any object of the mongommf class (representing a database record file), and the changes made can still be persisted. At this point, I began to suspect that the call to flush on Windows will not be persistent, but will be persistent when memcopy is changed. I searched the internet, the same problem occurs in windows. (In csdn, the "memory ing, no flushviewoffile, and can be saved to a file" has encountered the same problem ).

For the special case of windows, I will not go into it any more. Everyone knows that the problem is OK. In fact, under this mechanism, the entire datafilesync thread used to flush data to the disk is not used. For Linux and UNIX, the above summary is correct.

Solution:

In fact, some people once lost all their data because of the problems mentioned above, so MongoDB team members started to improve the reliability of standalone machines on the latest branch of version 1.7, this is the introduced journal \ durability module, which focuses on solving this problem. (For more information, seeArticle"MongoDB data reliability, standalone reliability is expected to be enhanced after version 1.8")

This journal \ durability module is also mentioned in the MongoDB source code overview-log. However, the last part is not completed. Next time we will have a special blog post to discuss the subsequent issues.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More