Traditional SQL paging
In traditional SQL paging, almost no row_number exists in all solutions. For scenarios that require various sorting and complex queries, row_number is a killer. In addition, the time stamp is usually used for paging loading by poll/push on the current web. These two types of pages can be said to be generic, and even the pages generated by Linq are row_number. The latter is the best both in terms of performance and complexity, because you only need a simple timestamp.
MongoDB paging
The idea of going into Mongo is that paging is actually not difficult. What is so rare? In fact, there is nothing to do with it. It is the same as the SQL paging logic.
First, describe the use case used in this article. I imported the following entity data into the database. In this case, cus_id and amount are generated into an ordered number, and the number of records imported is 200 w:
The code is as follows: |
|
Public class Test { /// <Summary> /// Primary key ObjectId is the primary key type that comes with MongoDB /// </Summary> Public ObjectId Id {get; set ;} /// <Summary> /// Customer ID /// </Summary> [BsonElement ("cust_id")] Public string CustomerId {get; set ;} /// <Summary> /// Total /// </Summary> [BsonElement ("amount")] Public int Amount {get; set ;} /// <Summary> /// Status /// </Summary> [BsonElement ("status")] Public string Status {get; set ;} } |
For the following operations based on the MongoDB GUI tool, see references 3.
First, let's take a look at the parameters and results required for paging. The common parameters required for paging are:
PageIndex current page
PageSize number of records per page
QueryParam [] other query fields
Therefore, according to the paging principle of row_number, that is to say, to the (pageIndex * pageSize) to the (pageIndex * pageSize + pageSize), we use Linq to express the following:
The code is as follows: |
|
Query. Where (xxx... xxx). Skip (pageIndex * pageSize). Take (pageSize) |
After searching for the information, there are actually skip functions, and the Limit function can be found in references 1 and 2, so the paging query is easily implemented:
The code is as follows: |
|
Db. test. find ({xxx... xxx }). sort ({"amount": 1 }). skip (10 ). limit (10) // Ignore the query statement here |
It was very efficient, and the result was just a few milliseconds. It was really top-notch in NoSql efficiency. However, it is slow. I only use 10 data records, and there is not much data. I added the data to 100000, and the efficiency was about 20 ms. If such a simple study is over, it is really disappointing the spirit of programmers to study. The SQL paging solution has different efficiency because it can be used in parallel. Does Mongo have such a solution? Obviously, this is not the answer. Is there a performance problem in terms of efficiency? Redis suffered such a loss and used Keys indiscriminately.
After checking some materials, I found that all the materials are described as follows:
Do not use Skip for query easily. Otherwise, a large amount of data will lead to a sharp decline in performance. This is because Skip is a piece of data, and when it is more, it will naturally slow down.
In this case, Skip should be avoided. How can this problem be avoided? First, let's review the last timestamp paging scheme of SQL paging. This method uses the ordered nature of fields and queries to retrieve data, which can directly avoid a large number of data. That is to say, if such a condition can be attached, the query efficiency will be improved. Is it actually like this? Let's verify:
Here we assume that we query 100,001st pieces of data, and the Amount value of this data is: 2399927. Let's write two statements as follows:
The code is as follows: |
|
Db. test. sort ({"amount": 1}). skip (100000). limit (10) // 183 ms Db. test. find ({amount: {$ gt: 2399927}). sort ({"amount": 1}). limit (10) // 53 ms |
The results have been annotated. Obviously, the performance of the latter is 1/3 of that of the former, and the gap is very large. It also confirms the theory of Skip efficiency difference.
C # implementation
We have already talked about MongoDB paging statements and efficiency, so we can implement the C # driver version.
This article uses the official BSON driver. For details, see reference 4. The Mongo driver is accompanied by a native query similar to ADO. NET, and a native query like Linq. Both of them are implemented here.
Solution 1: native Query implementation for conditional Query
The code is as follows: |
|
Var query = Query <Test>. GT (item => item. Amount, 2399927 ); Var result = collection. Find (query). SetLimit (100) . SetSortOrder (SortBy. Ascending ("amount"). ToList (); Console. WriteLine (result. First (). ToJson (); // ToJson |
Solution 2: Skip native Query implementation
The code is as follows: |
|
Var result = collection. FindAll (). SetSkip (100000). SetLimit (100) . SetSortOrder (SortBy. Ascending ("amount ")); Console. WriteLine (result. ToList (). First (). ToJson ()); |
Solution 3: Query by using the Linq condition
The code is as follows: |
|
Var result = collection. AsQueryable <Test> (). OrderBy (item => item. Amount) . Where (item => item. Amount> 2399927). Take (100 ); Console. WriteLine (result. First (). ToJson ()); |
Solution 4: Linq Skip version
The code is as follows: |
|
Var result = collection. AsQueryable <Test> (). OrderBy (item => item. Amount). Skip (100000). Take (100 ); Console. WriteLine (result. First (). ToJson ()); |
Performance Comparison
The test code here will be uploaded later. The specific implementation is to use Lao Zhao (my idol ~) CodeTimer to calculate the performance. In addition, I run the code using the TestDriven plug-in.
Solution 1:
The code is as follows: |
|
Pagination GT-Limit {"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "} Time Elapsed: 1,322 ms CPU Cycles: 4,442,427,252 Gen 0: 0 Gen 1: 0 Gen 2: 0 |
Solution 2:
The code is as follows: |
|
Pagination Skip-limit {"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "} Time Elapsed: 95 ms CPU Cycles: 18,280,728 Gen 0: 0 Gen 1: 0 Gen 2: 0 |
Solution 3:
The code is as follows: |
|
PaginatiLinq on Linq where {"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "} Time Elapsed: 76 ms CPU Cycles: 268,734,988 Gen 0: 0 Gen 1: 0 Gen 2: 0 |
Solution 4:
The code is as follows: |
|
Pagination Linq Skip {"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "} Time Elapsed: 97 ms CPU Cycles: 30,834,648 Gen 0: 0 Gen 1: 0 Gen 2: 0 |
The above results are not surprising, which is too different from the theory. Why is the gap between the first time and the following? At first, I thought it was a C # Mongo driver problem. I tried to change the driver. When I read MongoDB in Action these days, I found that:
MongoDB loads the index and metadata of the document to the memory based on the query, and it is recommended that the size of the document metadata always be smaller than the machine memory, otherwise the performance will decrease.
After paying attention to the above theory, I replaced my test solution. I checked out the solution for the first time, and then compared it to find that the result was normal.
Correction result of Solution 1:
The code is as follows: |
|
Pagination GT-Limit {"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount ": 2399928," status ":" B "} Time Elapsed: 18 ms CPU Cycles: 54,753,796 Gen 0: 0 Gen 1: 0 Gen 2: 0 |
Summary
This article compares two methods of querying pages based on Skip pages and ordered fields. The latter adds that the query results are not counted in order to improve the performance. Although Skip is less efficient, it is more common. For ordered field query, you need to perform some processing on this field when designing the page. At least you need to obtain this field from the page number. Here I add a method, that is, the combination of the two. We can use the last one on the page data displayed each time and the Skip to process the pages. In this way, it is better. This is not implemented here. You are welcome to share the performance comparison and implementation of other methods. Thank you very much.
References
1. MongoDB Skip function: http://docs.mongodb.org/manual/reference/operator/aggregation/skip/
2. MongoDB Limit function: http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
3. Movie Vue Windows client management tool (with paid version): http://www.mongovue.com/
4. C # Official drive: http://docs.mongodb.org/manual/applications/drivers/