MongoDB Indexing principle

Source: Internet
Author: User
Tags createindex sorts

When you insert multiple documents into a collection, each document is persisted by the underlying storage engine, and a location information is available to read the document from the storage engine. For example, in the MMAPV1 engine, location information is 『文件id + 文件内offset 』 , in the Wiredtiger storage engine (a KV storage engine), the location information is wiredtiger in the storage of documents generated by a key, through this key can access to the corresponding document, for easy introduction, unified pos(position的缩写)to represent location information.

Location Information Document
Pos1 {"Name": "Jack", "Age": 19}
Pos2 {"name": "Rose", "Age": 20}
Pos3 {"Name": "Jack", "Age": 18}
Pos4 {"Name": "Tony", "Age": 21}
Pos5 {"Name": "Adam", "Age": 18}

Suppose you now have a query db.person.find( {age: 18} ) that queries all ages 18, then you need to traverse all the documents ("Full table Scan"), read the document based on location information, and compare the age field to 18. Of course, if there are only 4 documents, the cost of full-table scanning is not big, but if the collection of documents to millions, or even tens of billions of times, the collection of full table scan cost is very large, a query of 10 seconds or even a few minutes is possible.

If you want to speed up db.person.find( {age: 18} ) , consider indexing the age field of the person table.

Db.person.createIndex ({age:1})  //  Create ascending index by age field

After indexing, MongoDB stores an additional copy of the index data sorted by the age of OrderBy, which is similar to the following, and the index is typically persisted with a btree-like structure to ensure quick ( O(logN)的时间复杂度 ) Identification of location information for an age value from the index. The corresponding document can then be read and retrieved based on the location information.

Age
Location Information
18 Pos3
18 Pos5
19 Pos1
20 Pos2
21st Pos4

Simply put, the index is organized in the 文档 Order of some (or some) fields so that queries can be efficiently based on that field. With an index, you can at least optimize the efficiency of the following scenarios:

    • queries, such as querying everyone aged 18
    • Update/delete, update or delete information for everyone aged 18, because when updating or deleting, you need to first query out all eligible documents according to the criteria, so you are essentially optimizing the query
    • Sort, sort all the information by age, and if there is no index, you need to scan the document in full form and then sort the results of the scan.

It is well known that MongoDB generates the _ID field by default for the inserted document (if the app itself does not specify the field), _id is the document's unique identity, and in order to ensure that documents are queried based on the document ID, MongoDB defaults to creating an index of _id fields for the collection.

// querying the index information for a collection [    {        "ns": "Test.person",  //  set name        "V": 1,               / / Index version         "key": {              //  indexed fields and sort direction            "_id": 1           //  index according to _id word orderby order          },        "name": "_id_"        //  index name     }]
MongoDB Index Type

MongoDB supports multiple types of indexes, including single-field indexes, composite indexes, multi-key indexes, text indexes, and so on, with each type of index having different usage scenarios.

Single Field Index
Db.person.createInd

The above statement creates a single-field index for age, which accelerates various query requests for the ages field, and is the most common form of indexing, which is the ID index created by MongoDB by default.

{Age:1} represents an ascending index, or you can specify a descending index by {age: 1}, and the ascending/descending effect is the same for single-field indexes.

Composite index (Compound index)

The composite index is an upgraded version of Single field index, which creates an index for multiple fields, sorts the first field, sorts the same document in the first field, and so on, creating a composite index for the 2 fields of age, name, and so on.

The data for the above index is organized similar to the following table, when the age field is sorted according to the name field, unlike the {age:1} index, the pos5 corresponding document is ranked before Pos3.

Age
Location Information
18 Pos5
18 Pos3
19 Pos1
20 Pos2
21st Pos4

Composite index can satisfy the query scene than the single-field index richer, not only can satisfy multiple fields combination of queries, for example db.person.find( {age: 18, name: "jack"} ) , can also meet the query that matches the index prefix, here {age:1} is {age:1, name:1} prefix, so similar db.person.find( {age: 18} ) Query can also be accelerated by the index, but db.person.find( {name: "jack"} ) the composite index cannot be used. If you frequently need to query based on the name field and combination of name and age fields, you should create a composite index such as the following

The field's value distribution is also an important consideration in addition to the query's need to affect the order of the indexes, even if all the queries for the person collection are "name and age field combinations" (specifying a specific name and age), the order of the fields is also affected.

The age field is limited in terms of the number of documents that have the same age field, and the Name field is much richer, with very few documents with the same name field, apparently by the name field, and by finding the age field in a document of the same name is more efficient.

Multi-key index (Multikey index)

When an indexed field is an array, the index created is called a multi-key index, and a multi-key index establishes an index for each element of the array, such as a person table that joins a habbit field (array) to describe interests, A multi-key index of the Habbit field is available for people who have the same interests to be queried.

{"Name": "Jack", "age": +, Habbit: ["Football, runnning"1})  // Automatic creation of multi-key index db.person.find ({habbit: "Football"})
Other types of indexes

Hash index (Hashed index) refers to the hash value of a field to establish an index, currently mainly used for MongoDB sharded cluster hash shard, hash index can only meet the field exactly matching query, can not meet the scope of query and so on.

Location Index (geospatial index) is a good solution to the application scenarios such as "find nearby food", "find a station in an area" and so on.

Text index can address the need for fast text lookups, such as having a collection of blog posts that need to be quickly found based on the content of the blog, and then create text indexes on the content of the blog.

Index Extra attributes

In addition to supporting many different types of indexes, MongoDB can also customize some special properties for indexes.

    • Unique index: guarantees that the fields corresponding to the index do not have the same value, such as the _id Index is the unique index
    • TTL index: You can specify when a document expires (expires after a specified time or expires at a point in time) for a time field
    • Partial index: Index only for documents that meet a certain criteria, which is supported by version 3.2
    • Sparse index (sparse index): indexes only on documents that have indexed fields, which can be seen as a special case of partial indexes
Index Optimization DB Profiling

MongoDB supports the profiling of DB requests and currently supports 3 levels of profiling.

    • 0: Do not open profiling
    • 1: Requests for processing time exceeding a certain threshold (default 100ms) are logged to the System.profile collection under DB (similar to MySQL, Redis Slowlog)
    • 2: Log all requests to the System.profile collection under DB (production environment is used with caution)

In general, the production environment is recommended to use Level 1 profiling, and to configure a reasonable threshold according to their own needs, to monitor the situation of slow requests, and timely indexing optimization.

It is the best choice if you can "decide which indexes should be created according to business query requirements" when the collection is created, but because of the changeable business requirements, we should optimize them according to the actual situation. The better the index, the better the index of the collection, the performance of the write, the update, and the need to update all indexed data for each write, so slow requests in your system.profile may be due to insufficient indexing or too many indexes.

Query plan

The index has been established, but the query is still very slow how to break? In this case, you have to analyze the usage of the index in depth, and you can decide how to optimize it by looking at the detailed query plan. The following issues can be seen through the implementation plan

    1. Query based on one or more fields, but not indexed
    2. Queries are based on one or more fields, but multiple indexes are established, and the expected index is not used when the query is executed.

Before indexing, db.person.find( {age: 18} ) you must perform a collscan, which is a full table scan.

Mongo-9552:primary> Db.person.find ({age:18}). Explain () {"Queryplanner" : {        "Plannerversion": 1,        "Namespace": "Test.person",        "Indexfilterset":false,        "Parsedquery" : {            "Age" : {                "$eq": 18            }        },        "Winningplan" : {            "Stage": "Collscan",            "Filter" : {                "Age" : {                    "$eq": 18                }            },            "Direction": "Forward"        },        "Rejectedplans" : [ ]    },    "ServerInfo" : {        "Host": "LocalHost",        "Port": 9552,        "Version": "3.2.3",        "Gitversion": "b326ba837cf6f49d65c2f85e1b70f6f31ece7937"    },    "OK": 1}

After indexing, you can see through the query plan that you Ixscan (look up from the index), then fetch, and read out the document that satisfies the criteria.

1Mongo-9552:primary> Db.person.find ({age:18}). Explain ()2 {3"Queryplanner" : {4"Plannerversion": 1,5"Namespace": "Test.person",6"Indexfilterset":false,7"Parsedquery" : {8"Age" : {9"$eq": 18Ten             } One         }, A"Winningplan" : { -"Stage": "FETCH", -"Inputstage" : { the"Stage": "IXSCAN", -"Keypattern" : { -"Age": 1 -                 }, +"IndexName": "Age_1", -"Ismultikey":false, +"IsUnique":false, A"Issparse":false, at"Ispartial":false, -"Indexversion": 1, -"Direction": "Forward", -"Indexbounds" : { -"Age" : [ -"[18.0, 18.0]" in                     ] -                 } to             } +         }, -"Rejectedplans" : [ ] the     }, *"ServerInfo" : { $"Host": "LocalHost",Panax Notoginseng"Port": 9552, -"Version": "3.2.3", the"Gitversion": "b326ba837cf6f49d65c2f85e1b70f6f31ece7937" +     }, A"OK": 1 the}
Resources
    • MongoDB Index Introduction
    • CreateIndex command
    • MongoDB sharded Cluster
    • Unique index
    • TTL index
    • Partial indexes (partial index)
    • Sparse index (sparse index)
    • Database profiling

MongoDB Indexing principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.