MongoDB paging query method and Performance

Last Update:2014-11-27 Source: Internet

Author: User

Tags mongodb limit mongodb gui

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MongoDB paging query method and Performance

I have been a little busy recently. There are a lot of things to summarize. In fact, there should be four, five, and six Redis series... however, Redis in Action has not been completed yet. I will summarize it later. Otherwise, it will be too watery. Sorry, readers.

Since the last time in Redis, it was a bit of an entry into Nosql products. This will change the direction and study the real NoSql database-MongoDB. Speaking of MongoDB, it is true that my data management and program view are overturned after I use it. What should I do? It would be great if I used it in an OO-designed program. It would be difficult for me to keep up with the pace of MongoDB because of a deep-rooted data-driven and table-driven idea. Of course, it is not to call an api, but to write a few query ideas, but to program design ideas, business design, if OO, how suitable for presentation, suitable for queries, suitable for aggregation operations, and so on. In short, MongoDB is important to the design of the program. After the design is complete, the storage of Mongo is ignored, because mongodb is too convenient.

Not to mention nonsense. For more information about the entry, installation, and other information, please end the article. I have attached some documents and will share them if necessary. This article focuses on the paging query of MongoDB. Why? Paging is a common top killer. It is hard to get it done, so customers scold and managers scold.

Traditional SQL Paging

In traditional SQL paging, almost no row_number exists in all solutions. For scenarios that require various sorting and complex queries, row_number is a killer. In addition, the time stamp is usually used for paging loading by poll/push on the current web. These two types of pages can be said to be generic, and even the pages generated by Linq are row_number. The latter is the best both in terms of performance and complexity, because you only need a simple timestamp.

MongoDB Paging

The idea of going into Mongo is that paging is actually not difficult. What is so rare? In fact, there is nothing to do with it. It is the same as the SQL paging logic.

First, describe the use case used in this article. I imported the following entity data into the database. In this case, cus_id and amount are generated into an ordered number, and the number of records imported is 200 w:

Public class Test {// <summary> /// the primary key ObjectId is the primary key type that comes with MongoDB /// </summary> public ObjectId Id {get; set ;} /// <summary> // customer ID /// </summary> [BsonElement ("cust_id")] public string CustomerId {get; set ;} /// <summary> /// total // </summary> [BsonElement ("amount")] public int Amount {get; set ;} /// <summary >/// status /// </summary> [BsonElement ("Status")] public string status {get; set ;}}

For the following operations based on the MongoDB GUI tool, see references 3.

First, let's take a look at the parameters and results required for paging. The common parameters required for paging are:

PageIndex current page
PageSize number of records per page
QueryParam [] Other query Fields

Therefore, according to the paging principle of row_number, that is to say, to the (pageIndex * pageSize) to the (pageIndex * pageSize + pageSize), we use Linq to express the following:

query.Where(xxx...xxx).Skip(pageIndex*pageSize).Take(pageSize)

After searching for the information, there are actually skip functions, and the Limit function can be found in References 1 and 2, so the paging query is easily implemented:

Db. test. find ({xxx... xxx }). sort ({"amount": 1 }). skip (10 ). limit (10) // ignore the query statement here

It was very efficient, and the result was just a few milliseconds. It was really top-notch in NoSql efficiency. However, it is slow. I only use 10 data records, and there is not much data. I added the data to 100000, and the efficiency was about 20 ms. If such a simple study is over, it is really disappointing the spirit of programmers to study. The SQL paging solution has different efficiency because it can be used in parallel. Does Mongo have such a solution? Obviously, this is not the answer. Is there a performance problem in terms of efficiency? Redis suffered such a loss and used Keys indiscriminately.

After checking some materials, I found that all the materials are described as follows:

Do not use Skip for query easily. Otherwise, a large amount of data will lead to a sharp decline in performance. This is because Skip is a piece of data, and when it is more, it will naturally slow down.

In this case, Skip should be avoided. How can this problem be avoided? First, let's review the last timestamp paging scheme of SQL paging. This method uses the ordered nature of fields and queries to retrieve data, which can directly avoid a large number of data. That is to say, if such a condition can be attached, the query efficiency will be improved. Is it actually like this? Let's verify:

Here we assume that we query 100,001st pieces of data, and the Amount value of this data is: 2399927. Let's write two statements as follows:

db.test.sort({"amount":1}).skip(100000).limit(10)  //183msdb.test.find({amount:{$gt:2399927}}).sort({"amount":1}).limit(10)  //53ms

The results have been annotated. Obviously, the performance of the latter is 1/3 of that of the former, and the gap is very large. It also confirms the theory of Skip efficiency difference.

C # implementation

We have already talked about MongoDB paging statements and efficiency, so we can implement the C # driver version.

This article uses the official BSON driver. For details, see reference 4. The Mongo driver is accompanied by a native query similar to ADO. NET, and a native query like Linq. Both of them are implemented here.

Solution 1: Native Query implementation for conditional Query

Var query = Query <Test>. GT (item => item. amount, 2399927); var result = collection. find (query ). setLimit (100 ). setSortOrder (SortBy. ascending ("amount ")). toList (); Console. writeLine (result. first (). toJson (); // The ToJson

Solution 2: Skip native Query implementation

var result = collection.FindAll().SetSkip(100000).SetLimit(100)             .SetSortOrder(SortBy.Ascending("amount"));Console.WriteLine(result.ToList().First().ToJson());

Solution 3: query by using the Linq Condition

var result = collection.AsQueryable<Test>().OrderBy(item => item.Amount)         .Where(item => item.Amount > 2399927).Take(100);Console.WriteLine(result.First().ToJson());

Solution 4: Linq Skip version

 var result = collection.AsQueryable<Test>().OrderBy(item => item.Amount).Skip(100000).Take(100);Console.WriteLine(result.First().ToJson());

Performance Comparison

The test code here will be uploaded later. The specific implementation is to use Lao Zhao (my idol ~) CodeTimer to calculate the performance. In addition, I run the code using the TestDriven plug-in.

Solution 1: pagination GT-Limit {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": "B"} Time Elapsed: 1,322 msCPU Cycles: 4,442,427,252 Gen 0: 0Gen 1: 0Gen 2: 0

Solution 2: pagination Skip-limit {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": & quot; B & quot;} Time Elapsed: 95 msCPU Cycles: 18,280,728 Gen 0: 0Gen 1: 0Gen 2: 0

Solution 3: paginatiLinq on Linq where {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": "B "}

Time Elapsed: 76 ms
CPU Cycles: 268,734,988
Gen 0: 0
Gen 1: 0
Gen 2: 0

Solution 4: pagination Linq Skip {"_ id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status ": "B"} Time Elapsed: 97 msCPU Cycles: 30,834,648 Gen 0: 0Gen 1: 0Gen 2: 0

The above results are not surprising, which is too different from the theory. Why is the gap between the first time and the following? At first, I thought it was a C # Mongo driver problem. I tried to change the driver. When I read MongoDB in Action these days, I found that:

MongoDB loads the index and metadata of the document to the memory based on the query, and it is recommended that the size of the document metadata always be smaller than the machine memory, otherwise the performance will decrease.

After paying attention to the above theory, I replaced my test solution. I checked out the solution for the first time, and then compared it to find that the result was normal.

Correction result of solution 1:

pagination GT-Limit{ "_id" : ObjectId("5472e383fc46de17c45d4682"), "cust_id" : "A12399997", "amount" : 2399928, "status" : "B" }Time Elapsed:   18msCPU Cycles:     54,753,796Gen 0:          0Gen 1:          0Gen 2:          0

Summary

This article compares two methods of querying pages based on Skip pages and ordered fields. The latter adds that the query results are not counted in order to improve the performance. Although Skip is less efficient, it is more common. For Ordered Field query, You need to perform some processing on this field when designing the page. At least you need to obtain this field from the page number. Here I add a method, that is, the combination of the two. We can use the last one on the page data displayed each time and the Skip to process the pages. In this way, it is better. This is not implemented here. You are welcome to share the performance comparison and implementation of other methods. Thank you very much. In addition, if you have any omissions or deficiencies in this article, please leave a message.

I forgot to make a small advertisement. We recruited people from our company. For details, see the subtitle of my blog !!

References

1. MongoDB Skip function: http://docs.mongodb.org/manual/reference/operator/aggregation/skip/

2. MongoDB Limit function: http://docs.mongodb.org/manual/reference/operator/aggregation/limit/

3. Movie vue Windows client management tool (with paid version): http://www.mongovue.com/

4. C # official drive: http://docs.mongodb.org/manual/applications/drivers/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More