We know that the MongoDB index is B-Tree, which is very similar to the MySQL index. So you should have heard of this suggestion: when creating an index, you should consider the sort operation and try to put the fields used by the sort operation behind your index. However, in some cases, the anti-MongoDB
We know that the MongoDB index is B-Tree, which is very similar to the MySQL index. So you should have heard of this suggestion: when creating an index, you should consider the sort operation and try to put the fields used by the sort operation behind your index. However, in some cases, this will lower your query performance.
Problem
For example, we perform the following query:
db.collection.find({"country": "A"}).sort({"carsOwned": 1})
The query condition is {"country": "A"}, which is sorted in the forward order of the carsOwned field. Therefore, the index can be easily created. you can directly create the Union index of the country and carsOwned fields. Like this:
db.collection.ensureIndex({"country": 1, "carsOwned": 1})
Let's look at a slightly more complex query:
db.collection.find({"country": {"$in": ["A", "G"]}}).sort({"carsOwned": 1})
This time we want to query the data entries whose country is A or G. The results are also sorted by the carsOwned field.
If we still use the above index and use explain () to analyze this query, we will find a "scanAndOrder": true field in the output, in addition, the value of nscanned may be much larger than expected, and even specifying limit does not work.
Cause
What is the reason? let's take a look at the figure below:
If not, the left side is an index created in the order of {"country": 1, "carsOwned": 1. On the right is an index created in the order of {"carsOwned": 1, "country": 1.
If we execute the preceding query and use the index on the left, we need to set the country value to A (one on the left) all sub-nodes and all sub-nodes whose country value is G (one on the right of the left graph) are also retrieved. Then sort the obtained data by carsOwned value.
So the above explain output a "scanAndOrder": true prompt, that is to say, this query first performs scan to obtain data, and then performs an independent sorting operation.
If we use the index on the right for the query, the results will be different. We didn't put the sorting field at the end, but put it at the front. Instead, we put the filtering field behind. The result is that we will traverse the node with the value of 1 (one on the left of the right). if the country value is A or G, it will be directly placed in the result set. After the query is completed for the specified number of limit instances. We can directly return the results, because at this time, all the results are arranged in the carsOwned forward order.
For the preceding dataset, if we need two results. We need to scan four records through the left Graph index, and then sort the four records to return the results. On the right side, we only need to scan two results to directly return the results (because the query process is to traverse the index in the desired order ).
Therefore, when performing a range query (including $ in, $ gt, $ lt, and so on), it is usually ineffective to append the sort index to the end. In the process of range query, the result set we get is not ranked by the appended field, and an additional sort is required. In this case, the index may be created in reverse order (the sorting field is in the front and the range query field is in the back), but it will be a better choice. Of course, whether it is better is also related to specific datasets.
Summary
To sum up, let's take two examples.
When the query is:
db.test.find({a:1,b:2}).sort({c:1})
Then you can directly create a joint index of {a: 1, B: 1, c: 1} or {B: 1, a: 1, c: 1.
If the query is:
db.test.find({a:1,b:{$in:[1,2]}}).sort({c:1})
It may be appropriate to create a joint index of {a: 1, c: 1, B: 1. Of course, there is only one more way of thinking here. whether or not to use it depends on your data.