MongoDB operating factors and data models

Source: Internet
Author: User
Tags mongo shell

MongoDB operating factors and data models

The data of the MongoDB modeling application depends on the data itself, which is also related to the characteristics of MongoDB. For example, different data models may increase the query efficiency of applications, increase the throughput of insert and update operations, and make the sharding cluster more efficient.

These processing or record requirements appear outside of the application, but will affect applications that use MongoDB as the database. When creating a data model, you must consider the read and write operations of the application in the following scenarios.

CentOS compilation and installation of MongoDB

CentOS compilation and installation of php extensions for MongoDB and mongoDB

CentOS 6 install MongoDB and server configuration using yum

Install MongoDB2.4.3 in Ubuntu 13.04

MongoDB beginners must read (both concepts and practices)

MongoDB authoritative Guide (The Definitive Guide) in English [PDF]

Document Growth

Updating a document may increase the document size. These updates include adding elements to an array and adding new fields to the document. If the document size increases to the maximum size allowed by the document, MongoDB will re-allocate the documents on the disk. Document reallocation takes longer time than document update and may cause file fragmentation. Although MongoDB automatically fills in document allocation to minimize the possibility of reallocation, document growth should be avoided as much as possible during data modeling.

For example, if your application updates will increase the document size, you should reconstruct the Data Model and use references between different documents instead of using the non-standardized Data Model (Embeded Data Model ).

Adaptive Adjustment of MongoDB helps reduce data migration. You may want to use a pre-allocated mode to avoid document growth.

Atomicity

In MongoDB, document-level operations are atomic. No write operation can automatically affect multiple documents or collections. When you modify multiple documents in a collection, you must perform the operation once on each document. To ensure that the application stores all fields requiring Atomicity in one document, if the application can tolerate non-atomicity updates, you can store the data in different documents.

Using an embedded data model helps you complete these atomic operations in a document. For data models that store the relevant data in reference mode, the application must perform different read and write operations to query and modify the relevant data.

Parts

MongoDB provides horizontal scaling through sharding. These clusters support deployment of large datasets and high-throughput operations. Sharding allows you to split a set in a database into multiple Mongod instances or multiple sharded document sets.

MongoDB uses sharded keywords to distribute data and traffic. Selecting the appropriate sharding keyword can improve the program performance and enable or disable query isolation to improve the program write performance. Therefore, you must carefully consider the sharding fields.

Index

Using indexes can improve normal query performance. Indexes are often created based on data that is frequently used for queries and returned for sorting in all operations. MongoDB creates an index on the _ id field.

When using indexes for MongoDB, consider the following scenarios:

(1) Each index requires at least 8 KB of data space.

(2) adding an index will affect the write operation performance. For high-speed read/write operations, the Index Update cost is high each time you insert or update the data.

(3) high-read-write sets usually benefit from additional indexes. Indexes do not affect read operations without any additional indexes.

(4) In the active state, the index will occupy the disk's hard disk and memory. This is a very expensive resource. You should monitor the planned capacity of the system, especially the working set size.

Big Data Set

In some cases, you may consider storing the relevant data in several different sets rather than in one set.

Consider the case where the logs set is used to store different environment variables and applications. The logs set is composed of the following types of forms:

{Log: "dev", ts:..., info :...}

{Log: "debug", ts:..., info :...}

If there are too few documents, you may group them by type. For logs, maintain different log sets, such as logs_dev and logs_debug. The logs_dev collection only contains documents related to the development environment.

In general, a large number of data sets do not have a significant performance impact, but are still high-performance. Different sets are very important for High-throughput sets.

When there are a large number of sets in the model, consider the following scenarios:

(1) Each set has a minimum overhead of several kb.

(2) Each index, including the _ id index, requires at least 8 KB of data space.

(3) For each database, each namespace (. ns file) stores all the metadata of the data. In addition, each index and set has its own namespace. The size of the MongoDB namespace file is limited (cannot exceed 2047 MB ).

(4) MongoDB limits the number of namespaces (The namespace size is divided by 628 ). If you want to know the sequence number of the current namespace so that you can know how many other namespaces are available, run the following command in Mongo shell:

Db. system. namespaces. count ()

The namespace size depends on the size of the. ns file. The default namespace size is 16 MB.

To change the namespace file size, add the following parameters when starting the server:

-- Nssize <new size MB>

For an existing database, use the-nssize parameter to start the server and run the following command.

Db. repairDatabase ()

Data lifecycle management

Data Modeling should also take data lifecycle management into account.

All document sets expire after being used for a period of Time. If the application needs To store some data for a period of Time in the database, use the Time To Live feature.

In addition, if the application only uses the recently inserted document. Consider using a restricted set (Capped Collections ). Capped Collections use the first-in-first-out method to insert and read documents in sequence.

MongoDB details: click here
MongoDB: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.