MongoDB Data Model and Index learning summary

Last Update:2015-03-19 Source: Internet

Author: User

Tags createindex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MongoDB Data Model and Index learning Summary 1. MongoDB Data Model:

MongoDB Data storage structure:
MongoDB uses Bson (binary JSON, binary encoded) data format to store and exchange data for documents (large files using the GRIDFS protocol). Bson absorbs the characteristics of JSON schema-less, the storage structure is loose, does not need to define the data storage metadata structure as the RDB (relational data), but also adds the support and optimization of multiple data types to make read and write more efficient.
(1) data types supported by BSON:

Double, String, Object, Array, Binary Data, Undefined, Object id, Boolean, Date, Null, Regular Expression, JavaScript, Symbol, JavaScript (with scope), 32-bit integer, Timestamp, 64-bitInteger, Min key, Max key
(2) BSON has the following forms:

{"_id": ObjectId ("542c2b97bac0595474108b48"), "ts": Timestamp (1412180887, 1), "name": "steven"}
(3) BSON is the communication protocol and data storage format in MongoDB: The client and server communication in MongoDB uses the BSON document format. For example, to query a piece of data, you need to write:

db.steven.find ({"name": "steven"})
Updating a piece of data needs to be written like this:

db.steven.update ({"name": "steven"}, {$ set: {"name": "jianying"}})
To delete a piece of data, you need to write:

db.steven.remove ({"name": "steven"})
In short, CRUD's RPC communication format for documents in MongoDB supports the BSON data format. And its storage format is similar to the BSON format:

{"_id": ObjectId ("542c2b97bac0595474108b48"), "ts": Timestamp (1412180887, 1), "name": "steven"}
(4) BSON data format encoding:
BSON's String type is encoded in UTF-8. The K value in the KV structure and the V value of the string type are encoded in UTF-8 format. Transcoding is required if other formats are used. And for K value, you can use any UTF-8 characters except the following requirements:

a. The key cannot contain \ o (null character)
b. $ and. have special meanings, only used in specific circumstances
c. Keys starting with an underscore "_" are reserved (not strictly required)
The encoding of other value types is encoded according to the built-in protocol of the specific data type. MongoDB supports the reference and nesting of documents in the way the data model is organized. The specific introduction is as follows.
Data Model Design Patterns-References and Nesting:
Storing data by reference is a model of MongoDB's data storage structure, that is, one document stores the necessary information needed to retrieve another document, for example:

{
_id: "joe",
name: "Joe Bookreader"
}

{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
The upper document is the information of user joe, while the lower document records his address information. To retrieve the address information based on joe's name, you need to retrieve the first document first, and then the second document. The design of the nested mode is as follows:

{
_id: "joe",
name: "Joe Bookreader",
addresses: [
{
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
]
}
Both of these two design patterns have their own advantages and disadvantages. The reference pattern is considered to be a standardized pattern, which reduces the redundancy of data storage and has a clean and simple structure design. It meets our general design principles, but the communication overhead to obtain complete data is relatively large, and the atomicity of multiple document operations cannot be guaranteed at the MongoDB level. The non-standardized nested design pattern has the opposite characteristics. It somewhat reduces the cost of communication, and atomicity is guaranteed in a single document. The disadvantage is that the data is redundant. Choosing which way to organize your data is actually a trade-off.

be careful:
(1) The size of the MongoDB document must be less than 16M. If it exceeds this size, consider using GirdFs.
(2) The size of the added document exceeds the space originally allocated to it, and MongoDB will move the document to another location on the disk. Migrating documents is more time consuming than in-place updates and can also cause disk fragmentation issues.
(3) In MongoDB, the atomicity of operations is guaranteed to the document level.
(4) Bson strings are encoded in UTF-8.

2. MongoDB index structure:
MongoDB supports index types:
MongoDB uses a B-tree structure to organize the index (effectively supports equivalence queries and range queries), and supports indexing on any field in the document, whether it is a single-valued, array, text, or nested structured field. MongoDB is a full index support strategy for the BSON storage format. In the face of multiple and powerful Mongo indexes, the design of the index has a relatively large impact on the performance improvement. The following types of indexes are supported by the latest MongoV 3.0 version:

Index type
Default _id Default ID index: Mongo builds the id field of the unique index by default. Each document has a _id field.
Single Field Index: Builds an index on a field in a document or a field in a nested document.
Compound Index: Combine multiple fields together to build an index. The field indexes form a tree structure above and below.
Multikey Index: For the index structure of the array type, establish an index for each value of the array.
Geospatial Index: For the geographic coordinate structure, constructing an index can efficiently locate the coordinate range, which is an additional benefit.
Text indexes: Text search similar to search engines, which involves word segmentation operations. Unfortunately, Chinese is not supported, and query syntax support is relatively single.
Hashed Indexes: Born to support Hash-based sharding (a deployment method), only supports equivalent retrieval, and does not support range retrieval.
The types of indexes are described above, and different types of indexes can have the following attributes, indirectly as follows:

Indexed attributes:
(1) Unique index: Consistent with the concept of RDB (Relational Database) unique index, it is designed to avoid duplicate values.
Construction methods such as:

db.members.createIndex ({"user_id": 1}, {unique: true})
(2) Sparse index: The sparseness of the sparse index is reflected in the fact that it only builds index entries for those documents that contain index fields. Ignore documents that do not contain index fields.
Construction methods such as:

db.addresses.createIndex ({"xmpp_id": 1}, {sparse: true})
(3) TTL index: TTL is the meaning of the life cycle as its name implies, that is, the stored document storage has an expiration time attribute and is automatically deleted after the life cycle. Log data, temporary data automatically generated by the system, and session data all meet this scenario .
Construction methods such as:

db.log_events.createIndex ({"createdAt": 1}, {expireAfterSeconds: 3600})
Index structure and characteristics:
(1) B-tree structure and sequential storage: MongoDB indexes are organized in a B-tree structure, which supports efficient equivalent-value queries and range queries. And the internal index entries (entry) are ordered by default, which can naturally guarantee that the returned results are ordered.
(2) Sorting of indexes: Index construction can specify that the index items are constructed in ascending or descending order. The selection of ascending or descending order is equivalent to single-value index, but it is not effective for combined index. The combined index is It is organized into a tree structure of upper and lower levels, and the wrong selection of ascending or descending order will have a large impact on performance.
(3) Intersection of indexes: After version 2.6, the query optimization strategy of indexes supports the intersection of indexes. Multiple indexes can be used in combination to retrieve data most efficiently. For example, two separate indexes can be constructed. When a query condition is associated with these two indexes, the index optimization plan will automatically combine the two indexes for retrieval.
For example, the following two indexes are constructed:

{qty: 1}
{item: 1}
Then the following query will hit the above two indexes:

db.orders.find ({item: "abc123", qty: {$ gt: 15}})
The intersection of additional indexes includes:

Index prefix intersection: Mainly for composite indexes, the query plan will optimize the prefix of the combined index to query.
Index analysis method:
(1) Evaluate the RAM capacity and try to ensure that the index is in memory:
Command to query the size of the index (in bytes):

db.collection.totalIndexSize ()
db.collection.stats ()
(2) Analyze the plan to view the index:
MongoDB uses explain and hint to view the indexing strategy:

db.collection.find (). explain ()
We can see which indexing strategy is in effect and the use of index intersections.

db.collection.find (). hint ({"name": 1})
The hint command can be used to force the use of an index.

(3) Index management information: There will be a system.indexes collection under each DB. This collection records the metadata information for index construction under the DB.

db.system.indexes.find ()
be careful:
(1) Each index needs at least 8K of space.
(2) MongoDB will automatically create a unique index on the _id field.
(3) A special index type supports the implementation of TTL collection. TTL relies on a background thread in Mongod, which reads the date type value in the index and deletes outdated documents from the collection.
MongoDB data model and index learning summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More