MongoDB Data Models Design

Source: Internet
Author: User

1 Introduction to Data modeling

MongoDB's data structure is flexible and does not enforce data structures. But usually a collection uses the same structure internally.

The key to data modeling is to balance the needs of the application with the performance of database execution and data retrieval models. When designing a data model, consider data usage and the structure of the data itself

1.1 Document Structure

The key to designing a data model for a MongoDB app is how the document structure and how the application represents the relationship between the data, and there are two tools that allow applications to represent these relationships: reference and embedded documents.

1.1.1 References:


A reference to a foreign key in a relational database, typically referencing a _id field in a different collection

1.1.2 Embedded Data:


The value of a field is Bson, or a field value is an array, and each value in the arrays is Bson

1.2 Atomic nature of write operations

MongoDB writes at the document level are atomic, with no single modification of multiple documents, or the operation of multiple collections is atomic. Non-normalized data that contains embedded data stores all relevant data for the entity it represents in a single document, This makes the operation of this entity an atomic operation on the document. The process of standardizing data stores data across multiple collections, which makes it possible to modify an entity multiple times per write, and multiple write operations are not atomic.

However, a structure that facilitates atomic manipulation may limit the application's use of data or restrict modifications to the application.


1.3 Document Growth

Some update operations, such as appending elements to an array or adding fields to a document, will cause the document to increase in size.
If the document size exceeds the space allocated to it, MongoDB will re-allocate space on the hard disk. This growth will affect whether you choose to use standardized or non-standardized data


1.4 Use and performance of data

When designing a data model, consider how your application will use your database. For example, if your application uses only recently inserted documents, consider using capped collections, or your application is primarily read, adding indexes for commonly used queries to improve performance.


2 Data Modeling concept 2.1 data Model design

An efficient data model caters to the needs of the application. The key factor in document structure considerations is whether to use an embedded document or a reference

2.1.1-Embedded data Model

An embedded document allows multiple related information to be saved in a document the application can perform fewer queries and update operations.

Embedded documents are used in the following situations:
1. Relationships between entities that have "inclusions"
2. There is a one-to-many relationship between entities. In this relationship, multiple parties are always present in the context of a party or as a parent document.



Embedded document read operation performance is good, a single database time to retrieve data fast. The operation of the update-related data is atomic.

Disadvantage: Embedded documents increase the size of the document after it is created. Further, the document must be less than the maximum limit for the Bson document

Interacting with an embedded document requires the use of the "." Operators to access embedded documents


2.1.2 Standardized Data Model

Standardize data use references to describe relationships between documents

The standardized data model is used in the following situations:
1. When data duplication is caused by the embedded document model, the query performance advantage cannot compensate for the lack of data duplication.
2. Many-to-many relationships that are more complex
3. Modeling multiple layers of data sets


References are more flexible than embedding. However, the application needs to query and parse the relevant references. In other words, standardizing the data model leads to more communication between the program and MongoDB.

2.2 Operational factors and data models

The data itself and the database need to be weighed.


2.2.1 Document Growth

The push operation of the array, and the addition of new fields, will cause the document to grow in size. MongoDB will reallocate space for it when it exceeds the amount of space allocated by the document, which will result in more time than in-place updates. It can also cause fragmentation of storage. Although MongoDB automatically adds gaps between documents to reduce similar redistribution, you need to avoid document growth as much as possible when building a model

2.2.2 Atomic operation

The above has been introduced, slightly

2.2.3 sharding

MongoDB uses sharding technology to provide horizontal scaling. These clusters support the development of large datasets and high-throughput operations. Sharding allows a user to distribute a collection within a database, through Mongod instances or shards labels, across multiple collection documents

2.2.4 Index

Use indexes to improve query performance. Indexing on certain fields is usually the time when the query on those fields returns the ordered results. MongoDB automatically builds an index on the _id field.

Consider the following when building an index:
1. Each index requires at least 8KB of space
2. Indexing can have a negative impact on write operations, and for collections with many writes, the cost of indexing is high because every operation that is inserted must update all indexes
3. Indexes are often useful for collections with more read operations. Indexes do not affect read operations that do not have an index.
4. When activated, each index consumes hard disk and memory space. These costs are critical to the capacity planning, especially the size of the working set, and should be traced.

2.2.5 a large collection of

In some cases, you need to choose to store the relevant information in multiple collections instead of a collection, such as different log content stored in a different log collection



Typically, a large number of collections have no performance degradation but performance optimizations. Different collections are important for high-throughput batch processes.

When using a data model with a large number of collections, consider the following:
1. A minimum of k space is required for each set
2. Each index, including the index on _ID, requires at least 8KB of space
3. For a database, a namespace file stores all the metadata for that database, and each index and collection has its own entry in the namespace file.
4.MongoDB has a limited number of namespaces. You might want to know the current number of namespaces to determine how many additional namespaces the database can support, query the number of current namespaces, and run in MONGO's shell: Db.system.namespaces.count ()


2.2.6 Data Lifecycle Management

Data modeling should take into account life cycle management of the data.
If your application needs to use data within a limited time period, consider using the TTL attribute.
Also, if your application uses only recently inserted documents, consider using capped collections. It provides management of the document's FIFO (that is, the queue) and effectively supports insert and read operations that depend on the insertion order.

2.3 Gridfs

Gridfs is a specification for storing and retrieving documents that exceed the Bson document size limit of 16M.
Instead of storing files in a single document, Gridfs splits the files into parts, or databases, and stores each block as a separate document. The default GRIDFS limit block size is 255k.gridfs uses two collections to store files. A block of data that combines the stored files, The other metadata information that stores the file.

When you query Gridfs stored files, the driver, or the client will reassemble the data blocks you need. You can perform multiple queries on files stored in Gridfs. You can also get information about a piece of the file, which allows you to jump to the middle of the video or audio.

Gridfs is not only useful for storing files larger than 16M, it can be used to store any file you don't want to load the entire file into memory while you're accessing it.



MongoDB Data Models Design

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.