Original address: http://www.cnblogs.com/foxracle/p/3421893.html
To learn more about how MongoDB stores data, one concept must be clear, and that is memeory-mapped Files.
memeory-mapped Files
Shows how the database deals with the underlying system.
- A memory-mapped file is an OS that creates a data file in memory through MMAP, which maps the file to a region of virtual memory.
- Virtual memory is an abstraction of physical memory for a process, with an address space size of 2^64
- The operating system maps all data required by the process to this address space (red line) through Mmap, and then maps the data currently needed for processing to physical memory (Gray line)
- When a process accesses a data, if the data is not in virtual memory, the page fault is triggered, and then the OS loads the data into virtual memory and physical memory from the hard disk.
- If the physical memory is full, triggering the swap-out operation, then some data needs to be written back to the disk, if it is purely memory data, write back to the swap partition, if not write back to the disk.
MongoDB's storage model
With memory-mapped files, the data to be accessed seems to be in memory, simplifying the logic of MongoDB accessing and modifying data
MongoDB reads and writes only deals with virtual memory, leaving all the OS to take care of
Virtual Memory size = All file size + some other overhead (connection, stack)
If journal is turned on, the virtual memory size is almost doubled
Benefits of using MMF 1: Do not manage memory and disk scheduling yourself 2:LRU Policy 3: During restart, the cache is still in.
The disadvantage of using MMF 1:ram use is affected by disk fragmentation, and high read-ahead can also affect 2: Unable to optimize the scheduling algorithm by itself, only using LRU
The files on the disk are composed of extent, which are allocated in extent when allocating the collection space.
A collection has one or more etent
Within the NS file, the namespace record points to the first extent of that set.
Data file and Space allocation
When you create a database (in fact, MongoDB does not explicitly create a database method, the database is automatically created when writing to a collection in the database), MongoDB allocates a set of data files on disk, all collections, indexes, and other metadata for the database are stored in these files. The data file is placed in the DBPath specified at startup and is placed under/data/db by default. A typical file organization structure is as follows:
$ cat/data/db
$ ls-al
-RW-------1 root root 16777216 09-18 00:54 local.ns
-RW-------1 root root 67108864 09-18 00:54 local.0
-RW-------1 root root 2146435072 09-18 00:55 local.1
-RW-------1 root root 2146435072 09-18 00:56 local.2
-RW-------1 root root 2146435072 09-18 00:57 local.3
-RW-------1 root root 2146435072 09-18 00:58 local.4
-RW-------1 root root 2146435072 09-18 00:59 local.5
-RW-------1 root root 2146435072 09-18 01:01 local.6
-RW-------1 root root 2146435072 09-18 01:02 local.7
-RW-------1 root root 2146435072 09-18 01:03 local.8
-RW-------1 root root 2146435072 09-18 01:04 local.9
-RW-------1 root root 2146435072 09-18 01:05 local.10
-RW-------1 root root 16777216 09-18 01:06 test.ns
-RW-------1 root root 67108864 09-18 01:06 test.0
-RW-------1 root root 134217728 09-18 01:06 test.1
-RW-------1 root root 268435456 09-18 01:06 test.2
-RW-------1 root root 536870912 09-18 01:06 test.3
-RW-------1 root root 1073741824 09-18 01:07 test.4
-RW-------1 root root 2146435072 09-18 01:07 test.5
-RW-------1 root root 2146435072 09-18 01:09 test.6
-RW-------1 root root 2146435072 09-18 01:11 test.7
-RW-------1 root root 2146435072 09-18 01:13 test.8
...
-rwxr-xr-x 1 root root 6 09-18 13:54 Mongod.lock
Drwxr-xr-x 2 root root 4096 11-13 18:39 Journal
Drwxr-xr-x 2 root root 4096 11-13 19:02 _tmp
Copy Code
The process ID of the server is stored in Mongod.lock, which is a process lock file. The data file is named according to the database to which it belongs.
TEST.NS is the first generated file (the NS extension is the meaning of namespace), each collection and index in the database has its own namespace, and the metadata for each namespace is stored in the file. By default, the. ns file size is fixed at 16MB, and approximately 24,000 namespaces can be stored. This means that the total number of indexes and collections in the database cannot exceed 24000, which can be customized by Mongod's--nssize option.
Files such as test.0 that end with a 0-based integer are collections and index data files. At first, MongoDB would pre-allocate several files, even if there was only one data, which would allow data to be stored as continuously as possible, reducing disk fragmentation. MongoDB allocates more data files when data is added like a database. Each new data file is twice times the size of the last allocated file (64m->128m->256m), up to a maximum of 2G of the pre-allocated file size. This is based on the assumption that if the total data size increases at a constant rate, you should gradually increase the space allocated for the data file. Of course, this pre-allocation strategy can also be switched off via--noprealloc, but it is not recommended for use in production environments.
The default local database, which does not participate in replication. When Mongod is a member of a replica set, there is a pre-allocated capped collection called Oplog.rs in the local database, with a pre-allocated size of 5% of disk space. This size can be adjusted by--oplogsize. The Oplog is primarily used for the replication of the replica set primary and secondary members, which limits the size of the two replica sets, and how long it is allowed to be out of sync before the full synchronization is complete.
Journal directory, the journal feature version 2.4 is enabled by default.
You can use Db.stats () to confirm that you have used space and allocated space.
Copy Code
{
"DB": "Test",
"Collections": 37,
"Objects": 317894523, #文档总个数
"Avgobjsize": 232.3416429039893, #单位是字节
"DataSize": 73860135744, #集合中所有数据实际大小 (including padding factor The extra space allocated for each document to allow the document to grow). This value is not reduced when the document size becomes smaller, unless the document is deleted, or the compact or repairdatabase operation is performed
"Storagesize": 97834319392, #分配给集合的空间大小 (including additional space reserved for collection growth and unallocated deleted space, which is not reduced because the document size is smaller or deleted), the space allocated from the data file to the collection is actually in blocks. Also known as extents, which is the size of the allocated extents
"Numextents": 385,
"Indexes": 86,
"Indexsize": 58687466992,
"FileSize": 182380920832, #所有数据文件大小之和, excluding namespace files (ns files)
"Nssizemb": 16,
"Datafileversion": {
"Major": 4,
"Minor": 5
},
"OK": 1
}
Copy Code
Use Db.accesslog.stats () to confirm the usage of a collection
Copy Code
{
"NS": "Test.accesslog",
"Count": 145352932,
"Size": 37060264352, # Actual data size, not including index
"Avgobjsize": 254.967435758365,
"Storagesize": 45794676448, #预分配的数据存储空间
"Numextents": 42,
"Nindexes": 4,
"Lastextentsize": 2146426864,
"Paddingfactor": 1, #当文档因更新size增长时事先padding可以提速, reduce the production of debris
"Systemflags": 1,
"UserFlags": 0,
"Totalindexsize": 31897944512,
"Indexsizes": {
"_id_": 6722168208,
"Action_1_time_1": 8606482752,
"Gz_id_1_action_1_time_1": 10753778336,
"Time_1": 5815515216
},
"OK": 1
}
How MongoDB stores data (reproduced)