The storage structure of MongoDB and its influence on the space utilization rate

Source: Internet
Author: User
Tags mongodb table name

Students who use MongoDB for a period of time will surely find that mongodb tend to occupy a lot more space than the actual data size. If you use the Db.stats () command to view, you will find that MongoDB reports several different spatial size information, such as DataSize, Storagesize, and FileSize. What do these sizes mean? Let's parse the meaning of these values by understanding the MongoDB storage mechanism. Database File Types

MongoDB database files are mainly 3: Journal log file namespace table name file data and index file

Log files

Unlike some traditional databases, MongoDB log files are used only to recover memory data that has not yet synchronized to the hard disk when the system is down. The log files are stored under a separate directory. At startup, MongoDB automatically creates 3 log files per 1G (initially empty) in advance. Unless you really have persistent massive data concurrent writes, generally 3 G is sufficient.

Name File Dbname.ns

This file is used to store the entire collection of databases and the name of the index. This file is not large, the default 16M, you can store 24,000 sets or index names and those collections and indexes in the data file in the specific location. Through this file MongoDB can know where to start looking for or inserting data from the collection or indexed data. This value can be adjusted to 2G by parameter.

Data file dbname.0, dbname.1,... DBNAME.N

MongoDB data and indexes are stored in one or more MongoDB data files. The first data file is named "database name. 0", such as my-db.0. The default size for this file is 64M, and MongoDB will generate the next data file such as My-db.1 before it is nearly finished with this 64M. The size of the data file is incremented by twice times. The second data file has a size of 128M and the third is 256M. It will stop after 2G and add new files to the size of this 2G.

Of course MongoDB will also generate some temporary files such as _tmp and Mongod.lock, but they are not very relevant to our discussion. Data File Structure

Extent

Within each data file, MongoDB organizes the data of the stored Bson document and the B-tree index into the logical container "Extent". As shown in the following illustration (My-db.1 and my-db.2 are two data files for the database):

A file can have multiple extent each extent will contain only one set of data or index the data or index of the same collection can be distributed across multiple extent. These several extent can also be step-by-step in multiple files within the same extent no data and index

Record Records

There are multiple "records" in each extent, each containing a header and a MongoDB Bson document, as well as some extra padding space. Padding is MongoDB extra unused space when inserting records so that you don't have to migrate documents elsewhere when the document becomes larger. The header begins with the size of the entire record, including the location of the record itself and the position of the previous record and the last record. Can be imagined as a double linked List. Database size parameters

On the basis of the previous, we can understand the db.stats () inside the meaning of the space size parameter.

DataSize

DataSize is the closest parameter to a real data size. You can use it to check how much data you have. This size includes the sum of each record of the database (or collection). Note that each record has the additional overhead of header and padding in addition to the Bson document. So the actual size will be slightly larger than the real data footprint.

When you delete a document, this parameter becomes smaller because it is the sum of the size of all the document numbers. If your document is not deleted, only the field inside the document is deleted or shrunk, it does not affect the datasize. The reason is because the document is still in the record, and the entire record occupies no changes in space, but the record of the unused space has become more.

Storagesize

This parameter is equal to the sum of all the data extents used by the database or a collection. Note that this number will be larger than datasize because there will be some fragments (deleted) left after the document is deleted from the extent. In time your storagesize big out datasize Many, this also not necessarily is the very bad situation. If a newly inserted document is less than or equal to the size of the fragment, MongoDB will reuse the fragment to store the new document. But until then, the fragments will remain there to occupy space. For this reason, this parameter does not become smaller when you delete the document.

Fragmentation problems can become serious because of the longer running time. You can use the compact command to clean up the pieces or to copy all the data from the machine to the new one, and then to the main node to resolve the fragments.

FileSize

This parameter is valid only on the database and refers to the size of the file used in the actual file system. It includes the sum of all the data extents, the sum of the index extent, and some unallocated space. Previously mentioned MongoDB will be pre-allocated for database file creation, for example, the minimum is 64M, even if you only have hundreds of KB of data. So this parameter might be a lot bigger than the actual data size. These additional unused spaces are used to ensure that MongoDB can quickly allocate new extent when new data is written, avoiding delays caused by disk space allocation.

It is noteworthy that when you delete a document, or even a collection and an index, this parameter does not become smaller. In other words, the disk space used by the database only rises (or does not change), and is not smaller because the data is deleted. What you need to know is that it doesn't mean waste, it's just that there's a lot of room for reservation.

This article is adapted from http://blog.mongolab.com/2014/01/how-big-is-your-mongodb/.

--This article from: http://www.mongoing.com/blog/file-storage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.