MongoDB paging query method and Performance
I have been a little busy recently. There are a lot of things to summarize. In fact, there should be four, five, and six Redis series... however, Redis in Action has not been completed yet. I will summarize it later. Otherwise, it will be too watery. Sorry, readers.
Since the last time in Redis, it was a bit of an entry into Nosql products. This will change the direction and study the real NoSql database-MongoDB. Speaking of MongoDB, it is true that my data management and program view are overturned after I use it. What should I do? It would be great if I used it in an OO-designed program. It would be difficult for me to keep up with the pace of MongoDB because of a deep-rooted data-driven and table-driven idea. Of course, it is not to call an api, but to write a few query ideas, but to program design ideas, business design, if OO, how suitable for presentation, suitable for queries, suitable for aggregation operations, and so on. In short, MongoDB is important to the design of the program. After the design is complete, the storage of Mongo is ignored, because mongodb is too convenient.
Not to mention nonsense. For more information about the entry, installation, and other information, please end the article. I have attached some documents and will share them if necessary. This article focuses on the paging query of MongoDB. Why? Paging is a common top killer. It is hard to get it done, so customers scold and managers scold.
Traditional SQL Paging
In traditional SQL paging, almost no row_number exists in all solutions. For scenarios that require various sorting and complex queries, row_number is a killer. In addition, the time stamp is usually used for paging loading by poll/push on the current web. These two types of pages can be said to be generic, and even the pages generated by Linq are row_number. The latter is the best both in terms of performance and complexity, because you only need a simple timestamp.
MongoDB Paging
The idea of going into Mongo is that paging is actually not difficult. What is so rare? In fact, there is nothing to do with it. It is the same as the SQL paging logic.
First, describe the use case used in this article. I imported the following entity data into the database. In this case, cus_id and amount are generated into an ordered number, and the number of records imported is 200 w:
Public class Test {// <summary> /// the primary key ObjectId is the primary key type that comes with MongoDB /// </summary> public ObjectId Id {get; set ;} /// <summary> // customer ID /// </summary> [BsonElement ("cust_id")] public string CustomerId {get; set ;} /// <summary> /// total // </summary> [BsonElement ("amount")] public int Amount {get; set ;} /// <summary >/// status /// </summary> [BsonElement ("Status")] public string status {get; set ;}}
For the following operations based on the MongoDB GUI tool, see references 3.
First, let's take a look at the parameters and results required for paging. The common parameters required for paging are:
- PageIndex current page
- PageSize number of records per page
- QueryParam [] Other query Fields
Therefore, according to the paging principle of row_number, that is to say, to the (pageIndex * pageSize) to the (pageIndex * pageSize + pageSize), we use Linq to express the following:
query.Where(xxx...xxx).Skip(pageIndex*pageSize).Take(pageSize)
After searching for the information, there are actually skip functions, and the Limit function can be found in References 1 and 2, so the paging query is easily implemented:
Db. test. find ({xxx... xxx }). sort ({"amount": 1 }). skip (10 ). limit (10) // ignore the query statement here
It was very efficient, and the result was just a few milliseconds. It was really top-notch in NoSql efficiency. However, it is slow. I only use 10 data records, and there is not much data. I added the data to 100000, and the efficiency was about 20 ms. If such a simple study is over, it is really disappointing the spirit of programmers to study. The SQL paging solution has different efficiency because it can be used in parallel. Does Mongo have such a solution? Obviously, this is not the answer. Is there a performance problem in terms of efficiency? Redis suffered such a loss and used Keys indiscriminately.
After checking some materials, I found that all the materials are described as follows:
Do not use Skip for query easily. Otherwise, a large amount of data will lead to a sharp decline in performance. This is because Skip is a piece of data, and when it is more, it will naturally slow down.
In this case, Skip should be avoided. How can this problem be avoided? First, let's review the last timestamp paging scheme of SQL paging. This method uses the ordered nature of fields and queries to retrieve data, which can directly avoid a large number of data. That is to say, if such a condition can be attached, the query efficiency will be improved. Is it actually like this? Let's verify:
Here we assume that we query 100,001st pieces of data, and the Amount value of this data is: 2399927. Let's write two statements as follows:
db.test.sort({"amount":1}).skip(100000).limit(10) //183msdb.test.find({amount:{$gt:2399927}}).sort({"amount":1}).limit(10) //53ms
The results have been annotated. Obviously, the performance of the latter is 1/3 of that of the former, and the gap is very large. It also confirms the theory of Skip efficiency difference.
C # implementation
We have already talked about MongoDB paging statements and efficiency, so we can implement the C # driver version.
This article uses the official BSON driver. For details, see reference 4. The Mongo driver is accompanied by a native query similar to ADO. NET, and a native query like Linq. Both of them are implemented here.
Solution 1: Native Query implementation for conditional Query
Var query = Query <Test>. GT (item => item. amount, 2399927); var result = collection. find (query ). setLimit (100 ). setSortOrder (SortBy. ascending ("amount ")). toList (); Console. writeLine (result. first (). toJson (); // The ToJson
Solution 2: Skip native Query implementation
var result = collection.FindAll().SetSkip(100000).SetLimit(100) .SetSortOrder(SortBy.Ascending("amount"));Console.WriteLine(result.ToList().First().ToJson());
Solution 3: query by using the Linq Condition
var result = collection.AsQueryable<Test>().OrderBy(item => item.Amount) .Where(item => item.Amount > 2399927).Take(100);Console.WriteLine(result.First().ToJson());
Solution 4: Linq Skip version
var result = collection.AsQueryable<Test>().OrderBy(item => item.Amount).Skip(100000).Take(100);Console.WriteLine(result.First().ToJson());
Performance Comparison
The test code here will be uploaded later. The specific implementation is to use Lao Zhao (my idol ~) CodeTimer to calculate the performance. In addition, I run the code using the TestDriven plug-in.
Solution 1: pagination GT-Limit {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": "B"} Time Elapsed: 1,322 msCPU Cycles: 4,442,427,252 Gen 0: 0Gen 1: 0Gen 2: 0
Solution 2: pagination Skip-limit {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": & quot; B & quot;} Time Elapsed: 95 msCPU Cycles: 18,280,728 Gen 0: 0Gen 1: 0Gen 2: 0
Solution 3: paginatiLinq on Linq where {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": "B "}
Time Elapsed: 76 ms
CPU Cycles: 268,734,988
Gen 0: 0
Gen 1: 0
Gen 2: 0
Solution 4: pagination Linq Skip {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": "B"} Time Elapsed: 97 msCPU Cycles: 30,834,648 Gen 0: 0Gen 1: 0Gen 2: 0
The above results are not surprising, which is too different from the theory. Why is the gap between the first time and the following? At first, I thought it was a C # Mongo driver problem. I tried to change the driver. When I read MongoDB in Action these days, I found that:
MongoDB loads the index and metadata of the document to the memory based on the query, and it is recommended that the size of the document metadata always be smaller than the machine memory, otherwise the performance will decrease.
After paying attention to the above theory, I replaced my test solution. I checked out the solution for the first time, and then compared it to find that the result was normal.
Correction result of solution 1:
pagination GT-Limit{ "_id" : ObjectId("5472e383fc46de17c45d4682"), "cust_id" : "A12399997", "amount" : 2399928, "status" : "B" }Time Elapsed: 18msCPU Cycles: 54,753,796Gen 0: 0Gen 1: 0Gen 2: 0
Summary
This article compares two methods of querying pages based on Skip pages and ordered fields. The latter adds that the query results are not counted in order to improve the performance. Although Skip is less efficient, it is more common. For Ordered Field query, You need to perform some processing on this field when designing the page. At least you need to obtain this field from the page number. Here I add a method, that is, the combination of the two. We can use the last one on the page data displayed each time and the Skip to process the pages. In this way, it is better. This is not implemented here. You are welcome to share the performance comparison and implementation of other methods. Thank you very much. In addition, if you have any omissions or deficiencies in this article, please leave a message.
I forgot to make a small advertisement. We recruited people from our company. For details, see the subtitle of my blog !!
References
1. MongoDB Skip function: http://docs.mongodb.org/manual/reference/operator/aggregation/skip/
2. MongoDB Limit function: http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
3. Movie vue Windows client management tool (with paid version): http://www.mongovue.com/
4. C # official drive: http://docs.mongodb.org/manual/applications/drivers/