The storage structure of MONGODB and its effect on space utilization

Source: Internet
Author: User
Tags mongodb table name

Using MongoDB for a while will certainly find that MongoDB tends to occupy a lot more space than the actual data size problem. If you use the Db.stats () command to view it, MongoDB will report several different spatial size information, such as DataSize, Storagesize, and FileSize. What do these sizes mean? Let's look at the storage mechanism of MongoDB to parse the meaning of these numbers. database File Type

MongoDB database files are mainly 3 kinds: Journal log file namespace table name file data and index file

Log file

Unlike some traditional databases, MongoDB's log files are only used to recover memory data that has not yet been synchronized to the hard disk when the system is down. The log files are stored under a separate directory. At startup, MongoDB automatically pre-creates 3 log files (initially empty) for each 1G. Unless you really have a persistent mass of data concurrently written, generally 3 G is enough.

Naming files Dbname.ns

This file is used to store the entire collection of databases and the name of the index. This file is not very large, the default 16M, you can store 24,000 sets or index names, and those collections and indexes in the data file in the specific location. With this file MongoDB can know where to start looking for or inserting collections of data or index data. This value can be adjusted to 2G by parameter.

Data file dbname.0, dbname.1,... DBNAME.N

MongoDB data and indexes are stored in one or more MongoDB data files. The first data file is named "database name. 0", such as my-db.0. The default size of this file is 64M, and MongoDB will generate the next data file, such as My-db.1, before approaching the end of this 64M. The size of the data file is incremented twice times. The second data file has a size of 128M and a third is 256M. Until the 2G will stop, always press the size of this 2G to add new files.

Of course MongoDB will also generate some temporary files such as _tmp and Mongod.lock, but they are not very relevant to our discussion. Data File Structure

Extent

Within each data file, MongoDB organizes the data of the stored Bson document and the B-tree index into the logical container "Extent". As shown in the following figure (My-db.1 and My-db.2 are the two data files of the database):

A file can have multiple extent each extent will contain only one set of data or the index of the same set of data or index can be distributed within multiple extent. These extent can also be divided into multiple files within the same extent will not have data and index

Record Records

There are multiple "records" in each extent, each containing a record header and MongoDB Bson document, plus some extra padding space. Padding is that MongoDB allocates some extra unused space when inserting records, so that future documents become larger without the need to migrate documents elsewhere. The record header starts at the size of the entire record, including the record's own location and the location of the previous record and the last record. Can be imagined as a double Linked List. Database size parameters

On the basis of the previous, we can understand the meaning of the space size parameter in Db.stats ().

DataSize

DataSize is a parameter closest to the real data size. You can use it to check how much data you have. This size includes the sum of each record of the database (or collection). Note that each record has the additional overhead of headers and padding in addition to the Bson document. So the actual size will be slightly larger than the amount of real data occupied.

When the document is deleted, this parameter becomes smaller because it is the sum of the sizes of all the documents. If your document is not deleted, only the fields inside the document are deleted or shrunk, it will not affect the datasize. The reason is because the document is still in the record, and the entire record occupies no change in space, but the record in the unused space is more.

Storagesize

This parameter is equal to the sum of all data extents used by the database or a collection. Note that this number will be greater than datasize because there will be some fragments (deleted) left behind when the document is deleted in extent. In time your storagesize big datasize Many, this also not necessarily is the very bad situation. If a newly inserted document is less than or equal to the size of the fragment, MongoDB will reuse the fragment to store the new document. But until then, the fragments will remain there to occupy space. For this reason, this parameter does not become smaller when you delete a document.

Fragmentation problems can become serious because of the length of time they run. You can resolve these fragments by using the compact command to clean up the pieces or by copying all the data from a new frame and then turning it into the master node.

FileSize

This parameter is valid only on the database and refers to the size of the file used in the actual file system. It includes the sum of all the data extents, the sum of the index extent, and some unallocated space. Previously mentioned MongoDB will pre-allocate the database file creation time, for example, the minimum is 64M, even if you only have hundreds of KB of data. So this parameter may be a lot larger than the actual data size. These additional unused space is used to ensure that MongoDB can quickly allocate new extent when new data is written, avoiding delays caused by disk space allocation.

It is important to note that when you delete documents, or even collections and indexes, this parameter does not become smaller. In other words, the hard disk space used by the database will only rise (or not), and will not be smaller because of the data being deleted. What you need to know is that this does not mean waste, just that there is a lot of room reserved.

This article is based on http://blog.mongolab.com/2014/01/how-big-is-your-mongodb/adaptation.

--This article transferred from: Http://www.mongoing.com/blog/file-storage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.