Experience on creating indexes in mongodb

Source: Internet
Author: User

Experience on creating indexes in mongodb

It has been nearly a year since I came into contact with mongodb, And I have accumulated a lot of experience in indexing it. On this dark night, I would like to summarize the mongodb index.

I. Index Introduction

Mongodb has two types of indexes: single-key index and composite index.

1. Single-key indexes are the simplest one. The cost of creating a single-key index is much lower than that of composite indexes. Single-key indexes are mainly used for single-value query conditions.

2. composite indexes are an index created by combining several keys in the document. Creating such indexes requires more space and performance overhead. They are embodied in:

1) when creating a composite index for a large amount of data, database queries will be blocked, not to mention the modification and insertion operations;

2) when inserting a data record, it takes more time to add data to the composite index;

3) The size of the site space for the created composite index varies depending on the data type and number of keys. For example, if you use five NumberInt keys to create a composite index, it does not occupy more space than two NumberInt and a String-type composite index. When designing data types for indexes, set the data type to the NumberInt type as much as possible, and use the data of the string type as little as possible for indexing;


2. Create an index

The statement for creating an index is simple.

1. Create a single-key index: db. test. ensureIndex ({name: 1}, {name: 'index _ name '})

2. Create a composite index: db. test. ensureIndex ({name: 1, age: 1, sex: 1 },{ name: 'index _ nas '})


Iii. INDEX OPTIMIZATION

Index optimization is a major concern and needs to be explained in detail. I have to insert 1 million pieces of test data. The fields are name, sex, type, time, and id.

1. let's look at a simple query: db. test. find ({name: 'name _ 1'}) I believe everyone is familiar with this query. Then let's take a look at the index execution plan of this statement:

{"Cursor": "BasicCursor", the index used by the query statement, while BasicCursor indicates that there is no index "isMultiKey": false, whether it is a composite index "n": 1, number of queried results "nscannedObjects": 1000000, number of scanned documents "nscanned": 1000000, number of scanned indexes "nscannedObjectsAllPlans": 1000000, // total number of scanned documents affected "nscannedAllPlans": 1000000, // total number of all scanned indexes "scanAndOrder": false, sorting? "indexOnly": false, "nYields": 2, "nChunkSkips": 0, "millis": 342, elapsed time "indexBounds" :{}, "server": "node1: 27017 "}

From this execution plan, we can see that the query statement needs to scan the entire table to query a piece of data. This is definitely a waste of effort. At this time, we should create an index for this field and create a single-key index.

Db. test. ensureIndex ({name: 1}, {name: 'index _ name '})

After the index is created, check the execution plan of the query statement:

{"cursor" : "BtreeCursor index_name","isMultiKey" : false,"n" : 1,"nscannedObjects" : 1,"nscanned" : 1,"nscannedObjectsAllPlans" : 1,"nscannedAllPlans" : 1,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 0,"indexBounds" : {"name" : [["name_1","name_1"]]},"server" : "node1:27017"}

It's just against the day. nscanned and nscannedObjects dropped from 1 million to 1, that is, when querying data, only one scanned item was found and it took 0 seconds, when no index is created, it is 342 milliseconds. The index is absolutely powerful.

2. at this time, I want to query the data of a certain condition by combining type and sex: db. test. find ({type: 1, sex: 0}) to see the execution plan of this sentence:

{"cursor" : "BasicCursor","isMultiKey" : false,"n" : 55555,"nscannedObjects" : 1000000,"nscanned" : 1000000,"nscannedObjectsAllPlans" : 1000000,"nscannedAllPlans" : 1000000,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 529,"indexBounds" : {},"server" : "node1:27017"}

From this plan, we can see that to search for tens of thousands of data records, it also scans the entire table. Obviously, the index is created:

Db. test. ensureIndex ({type: 1, sex: 1}, {name: 'index _ Ts '})

After creating the index, run the query statement to check the execution plan:

db.test.find({type:1,sex:0}).explain(){"cursor" : "BtreeCursor index_ts","isMultiKey" : false,"n" : 55555,"nscannedObjects" : 55555,"nscanned" : 55555,"nscannedObjectsAllPlans" : 55555,"nscannedAllPlans" : 55555,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 112,"indexBounds" : {"type" : [[1,1]],"sex" : [[0,0]]},"server" : "node1:27017"}

Obviously, it is definitely the best index, because n = nscannedObjects = nscanned and the query time has dropped from 529 milliseconds to 112 milliseconds, which is also a qualitative leap, obviously, it uses the newly created index_ts index.

Now I have another requirement. I want to sort it by time. Okay, let's execute the query statement db. test. find ({type: 1, sex: 0 }). sort ({time:-1}) let's take a look at the execution plan of this query statement:

{"cursor" : "BtreeCursor index_ts","isMultiKey" : false,"n" : 55555,"nscannedObjects" : 1000000,"nscanned" : 1000000,"nscannedObjectsAllPlans" : 1000000,"nscannedAllPlans" : 1000000,"scanAndOrder" : true,"indexOnly" : false,"nYields" : 1,"nChunkSkips" : 0,"millis" : 695,"indexBounds" : {"type" : [[1,1]],"sex" : [[0,0]]},"server" : "node1:27017"}

No, this query statement is quite different from the query results after an index is created. scanAndOrder and millis took nearly 700 milliseconds, after the query is completed, the sorting operation will be performed, which is too unfriendly. Then, a sorting operation will be added. How can this change from the White Swan to the ugly duckling? Ah, the key parameter is scanAndOrder, which means to sort the results in the memory. Well, since you are so thin, I will create a composite index to defend against: db. test. ensureIndex ({type: 1, sex: 1, time:-1}, {name: 'index _ tst '})

{"cursor" : "BtreeCursor index_tst","isMultiKey" : false,"n" : 55555,"nscannedObjects" : 55555,"nscanned" : 55555,"nscannedObjectsAllPlans" : 55555,"nscannedAllPlans" : 55555,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 126,"indexBounds" : {"type" : [[1,1]],"sex" : [[0,0]],"time" : [[{"$maxElement" : 1},{"$minElement" : 1}]]},"server" : "node1:27017"}

Have you seen it? Parameters return to the optimal state. At this time, someone may ask, why should we put time at the end of the index instead of other locations? In fact, this is required when creating an index, that is:

  1. Put the equivalent index at the beginning

  2. Try to put the sorting field before the range field

  3. $ Nin and $ ne have no relationship with indexes

Next, we will add the following conditions to the query statement: db. test. find ({type: 1, sex: 0, id: {$ gt: 1, $ lt: 500000}) the execution plan is as follows:

{"cursor" : "BasicCursor","isMultiKey" : false,"n" : 55555,"nscannedObjects" : 1000000,"nscanned" : 1000000,"nscannedObjectsAllPlans" : 1000000,"nscannedAllPlans" : 1000000,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 2,"nChunkSkips" : 0,"millis" : 553,"indexBounds" : {},"server" : "node1:27017"}

As you can see, only more than 20 thousand pieces of data are returned, but the entire table is scanned. This is definitely a headache. The index starts from:

Db. test. ensureIndex ({Type: 1, sex: 1, id: 1}, {name: 'index _ tis '})

{"cursor" : "BtreeCursor index_tis","isMultiKey" : false,"n" : 55555,"nscannedObjects" : 55555,"nscanned" : 55555,"nscannedObjectsAllPlans" : 55555,"nscannedAllPlans" : 55555,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 1,"nChunkSkips" : 0,"millis" : 137,"indexBounds" : {"type" : [[1,1]],"sex" : [[0,0]],"id" : [[1,1000000]]},"server" : "node1:27017"}

Obviously, this is a very good composite index. Why don't I put the id elsewhere, but at the end? In mongodb, indexes are executed from left to right. Therefore, it is clear that the maximum data volume needs to be filtered from left to right. Obviously, the combination of type and sex filters more data volumes than IDs, because the busy query rate of IDS is much higher than the two combinations.

Then add the time-based sorting and query: db. test. find ({type: 1, sex: 1, id: {$ gt: 0, $ lt: 1000000 }}). sort ({time:-1 }). explain ()

{"cursor" : "BasicCursor","isMultiKey" : false,"n" : 55556,"nscannedObjects" : 1000000,"nscanned" : 1000000,"nscannedObjectsAllPlans" : 1000000,"nscannedAllPlans" : 1000000,"scanAndOrder" : true,"indexOnly" : false,"nYields" : 1,"nChunkSkips" : 0,"millis" : 725,"indexBounds" : {},"server" : "node1:27017"}

As you can see, this query statement is extremely slow and also requires sorting in the memory. Therefore, you must create an index:

Db. test. ensureIndex ({Type: 1, sex: 1, id: 1, time:-1}, {name: 'index _ tist '}) Let's first create an index and look at the execution plan:

{"cursor" : "BtreeCursor index_tist","isMultiKey" : false,"n" : 55556,"nscannedObjects" : 55556,"nscanned" : 55556,"nscannedObjectsAllPlans" : 55657,"nscannedAllPlans" : 55657,"scanAndOrder" : true,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 404,"indexBounds" : {"type" : [[1,1]],"sex" : [[1,1]],"id" : [[0,1000000]],"time" : [[{"$maxElement" : 1},{"$minElement" : 1}]]},"server" : "node1:27017"}

No. Although the query time is shortened, the query results will still be sorted. Well, let's change the index:

Db. test. ensureIndex ({type: 1, sex: 1, time:-1, id: 1 },{ name: 'index _ tist '})

{"cursor" : "BtreeCursor index_tist","isMultiKey" : false,"n" : 55556,"nscannedObjects" : 55556,"nscanned" : 55556,"nscannedObjectsAllPlans" : 55657,"nscannedAllPlans" : 55657,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 168,"indexBounds" : {"type" : [[1,1]],"sex" : [[1,1]],"time" : [[{"$maxElement" : 1},{"$minElement" : 1}]],"id" : [[0,1000000]]},"server" : "node1:27017"}

Let's take a look at the degree to which the query speed and parameter conditions are much faster than the previous index. Why is this happening? Why does time have different performance before and after the id? This is because data already exists in the memory after filtering through the type and sex fields. What should I do next? Is it filtered by id first or by sorting? Here is a knowledge point. When the id is placed before time, the program first obtains the compound id value and then sorts the composite data. However, if the id is placed behind the sorting, then the program will directly scan the index tree in order to retrieve the data in the compound id range.

Iv. Summary

1. The difficulty in creating an index for mongodb lies in the selection of fields for sorting and range query.

2. the index truncation query of the mongodb compound index is sequential, that is, if (a: 1, B: 1, c: 1}, it can be a query {a: 1 }, this index is used for any of {a: 1, B: 1}, {a: 1, B: 1, c: 1, this index will not be used for other queries;

3. Try to create fewer indexes to improve database performance

4. The above index optimization is only part of the production environment. The specific situation may depend on your own business.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.