How to query by Page and optimize the performance of NoSQL database MongoDB

Source: Internet
Author: User
Tags manual mongodb mongodb limit tojson mongodb gui

Traditional SQL paging

In traditional SQL paging, almost no row_number exists in all solutions. For scenarios that require various sorting and complex queries, row_number is a killer. In addition, the time stamp is usually used for paging loading by poll/push on the current web. These two types of pages can be said to be generic, and even the pages generated by Linq are row_number. The latter is the best both in terms of performance and complexity, because you only need a simple timestamp.

MongoDB paging

The idea of going into Mongo is that paging is actually not difficult. What is so rare? In fact, there is nothing to do with it. It is the same as the SQL paging logic.

First, describe the use case used in this article. I imported the following entity data into the database. In this case, cus_id and amount are generated into an ordered number, and the number of records imported is 200 w:

The code is as follows:

Public class Test
{
/// <Summary>
/// Primary key ObjectId is the primary key type that comes with MongoDB
/// </Summary>
Public ObjectId Id {get; set ;}
/// <Summary>
/// Customer ID
/// </Summary>
[BsonElement ("cust_id")]
Public string CustomerId {get; set ;}
/// <Summary>
/// Total
/// </Summary>
[BsonElement ("amount")]
Public int Amount {get; set ;}
/// <Summary>
/// Status
/// </Summary>
[BsonElement ("status")]
Public string Status {get; set ;}
}


For the following operations based on the MongoDB GUI tool, see references 3.

First, let's take a look at the parameters and results required for paging. The common parameters required for paging are:

PageIndex current page
PageSize number of records per page
QueryParam [] other query fields

Therefore, according to the paging principle of row_number, that is to say, to the (pageIndex * pageSize) to the (pageIndex * pageSize + pageSize), we use Linq to express the following:

The code is as follows:
Query. Where (xxx... xxx). Skip (pageIndex * pageSize). Take (pageSize)



After searching for the information, there are actually skip functions, and the Limit function can be found in references 1 and 2, so the paging query is easily implemented:

The code is as follows:
Db. test. find ({xxx... xxx }). sort ({"amount": 1 }). skip (10 ). limit (10) // Ignore the query statement here



It was very efficient, and the result was just a few milliseconds. It was really top-notch in NoSql efficiency. However, it is slow. I only use 10 data records, and there is not much data. I added the data to 100000, and the efficiency was about 20 ms. If such a simple study is over, it is really disappointing the spirit of programmers to study. The SQL paging solution has different efficiency because it can be used in parallel. Does Mongo have such a solution? Obviously, this is not the answer. Is there a performance problem in terms of efficiency? Redis suffered such a loss and used Keys indiscriminately.

After checking some materials, I found that all the materials are described as follows:

Do not use Skip for query easily. Otherwise, a large amount of data will lead to a sharp decline in performance. This is because Skip is a piece of data, and when it is more, it will naturally slow down.

In this case, Skip should be avoided. How can this problem be avoided? First, let's review the last timestamp paging scheme of SQL paging. This method uses the ordered nature of fields and queries to retrieve data, which can directly avoid a large number of data. That is to say, if such a condition can be attached, the query efficiency will be improved. Is it actually like this? Let's verify:

Here we assume that we query 100,001st pieces of data, and the Amount value of this data is: 2399927. Let's write two statements as follows:

The code is as follows:
Db. test. sort ({"amount": 1}). skip (100000). limit (10) // 183 ms

Db. test. find ({amount: {$ gt: 2399927}). sort ({"amount": 1}). limit (10) // 53 ms



The results have been annotated. Obviously, the performance of the latter is 1/3 of that of the former, and the gap is very large. It also confirms the theory of Skip efficiency difference.

 

C # implementation

We have already talked about MongoDB paging statements and efficiency, so we can implement the C # driver version.

This article uses the official BSON driver. For details, see reference 4. The Mongo driver is accompanied by a native query similar to ADO. NET, and a native query like Linq. Both of them are implemented here.

Solution 1: native Query implementation for conditional Query

The code is as follows:

Var query = Query <Test>. GT (item => item. Amount, 2399927 );
Var result = collection. Find (query). SetLimit (100)
. SetSortOrder (SortBy. Ascending ("amount"). ToList ();
Console. WriteLine (result. First (). ToJson (); // ToJson



Solution 2: Skip native Query implementation

The code is as follows:

Var result = collection. FindAll (). SetSkip (100000). SetLimit (100)
. SetSortOrder (SortBy. Ascending ("amount "));
Console. WriteLine (result. ToList (). First (). ToJson ());



Solution 3: Query by using the Linq condition

The code is as follows:

Var result = collection. AsQueryable <Test> (). OrderBy (item => item. Amount)
. Where (item => item. Amount> 2399927). Take (100 );
Console. WriteLine (result. First (). ToJson ());



Solution 4: Linq Skip version

The code is as follows:
Var result = collection. AsQueryable <Test> (). OrderBy (item => item. Amount). Skip (100000). Take (100 );
Console. WriteLine (result. First (). ToJson ());


Performance Comparison

The test code here will be uploaded later. The specific implementation is to use Lao Zhao (my idol ~) CodeTimer to calculate the performance. In addition, I run the code using the TestDriven plug-in.

Solution 1:

The code is as follows:

Pagination GT-Limit
{"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "}
Time Elapsed: 1,322 ms
CPU Cycles: 4,442,427,252
Gen 0: 0
Gen 1: 0
Gen 2: 0


Solution 2:

The code is as follows:

Pagination Skip-limit
{"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "}
Time Elapsed: 95 ms
CPU Cycles: 18,280,728
Gen 0: 0
Gen 1: 0
Gen 2: 0


Solution 3:

The code is as follows:

PaginatiLinq on Linq where
{"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "}

Time Elapsed: 76 ms
CPU Cycles: 268,734,988
Gen 0: 0
Gen 1: 0
Gen 2: 0


Solution 4:

The code is as follows:

Pagination Linq Skip
{"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount": 2399928, "status": "B "}
Time Elapsed: 97 ms
CPU Cycles: 30,834,648
Gen 0: 0
Gen 1: 0
Gen 2: 0



The above results are not surprising, which is too different from the theory. Why is the gap between the first time and the following? At first, I thought it was a C # Mongo driver problem. I tried to change the driver. When I read MongoDB in Action these days, I found that:

MongoDB loads the index and metadata of the document to the memory based on the query, and it is recommended that the size of the document metadata always be smaller than the machine memory, otherwise the performance will decrease.

After paying attention to the above theory, I replaced my test solution. I checked out the solution for the first time, and then compared it to find that the result was normal.

Correction result of Solution 1:

The code is as follows:
Pagination GT-Limit
{"_ Id": ObjectId ("5472e383fc46de17c45d4682"), "cust_id": "A12399997", "amount
": 2399928," status ":" B "}
Time Elapsed: 18 ms
CPU Cycles: 54,753,796
Gen 0: 0
Gen 1: 0
Gen 2: 0



Summary

This article compares two methods of querying pages based on Skip pages and ordered fields. The latter adds that the query results are not counted in order to improve the performance. Although Skip is less efficient, it is more common. For ordered field query, you need to perform some processing on this field when designing the page. At least you need to obtain this field from the page number. Here I add a method, that is, the combination of the two. We can use the last one on the page data displayed each time and the Skip to process the pages. In this way, it is better. This is not implemented here. You are welcome to share the performance comparison and implementation of other methods. Thank you very much.

References

1. MongoDB Skip function: http://docs.mongodb.org/manual/reference/operator/aggregation/skip/
2. MongoDB Limit function: http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
3. Movie Vue Windows client management tool (with paid version): http://www.mongovue.com/
4. C # Official drive: http://docs.mongodb.org/manual/applications/drivers/

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.