Considerations on the Java implementation and paging requirements of MongoDB paging

Last Update:2018-06-07 Source: Internet

Author: User

Tags createindex sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

The traditional relational database provides paging function based on row number, after switching MongoDB, if you want to implement paging, you need to modify the idea.

Traditional paging ideas

Suppose a page size is 10 bars. The

//page 11-10//page 211-20//page 321-30...//page n10*(n-1) +1 - 10*n

MongoDB provides the skip () and limit () methods.

Skip: Skips a specified amount of data. Can be used to skip the data before the current page, that is, skip pagesize* (n-1).
Limit: Specifies the number of record bars read from MongoDB, which can be pagesize as the page size.

So, paging can do this:

//Page 1db.users.find().limit (10)//Page 2db.users.find().skip(10).limit(10)//Page 3db.users.find().skip(20).limit(10)........

Problem

It seems that pagination has been implemented, but official documents are not recommended, saying that all documents will be scanned before returning the results.

The Cursor.skip () method requires the server to scan from the beginning of the input results set before beginning to Retur N results. As the offset increases, cursor.skip () would become slower.

So, a faster way is needed. In fact, and the number of MySQL is not recommended with limit m,n, the solution is to first find out the current page, and then the number of sequential pagesize bar. MongoDB is also the official recommendation.

The correct paging method

We assume that query comparisons are based on _id conditions. In fact, the baseline field for this comparison can be any ordinal field you want, such as a timestamp.

//Page 1db.users.find().limit(pageSize);//Find the id of the last document in this pagelast_id = ... //Page 2users = db.users.find({  '_id' :{ "$gt" :ObjectId("5b16c194666cd10add402c87")}}).limit(10)//Update the last id with the id of the last document in this pagelast_id = ...

Obviously, the first page is different from the later one. For building a paging API, we can ask the user to pass pagesize, LastID.

PageSize Page Size
LastID the ID of the last record on the previous page, and if not, it will be forced to the first page

Descending

_idDescending, the first page is the largest, and the next page's ID is smaller than the last ID of the previous page.

function printstudents(Startvalue,Nperpage){   LetEndvalue= NULL;  DB.Students.Find({ _id: { $lt:Startvalue} })             .Sort({ _id: -1 })             .Limit(Nperpage).ForEach(Student= {               Print(Student.name);Endvalue= Student._id;             });  returnEndvalue;}

Ascending

_idAscending, the next page ID is larger than the last record ID of the previous page.

function printstudents(Startvalue,Nperpage){   LetEndvalue= NULL;  DB.Students.Find({ _id: { $GT:Startvalue} })             .Sort({ _id: 1 })             .Limit(Nperpage).ForEach(Student= {               Print(Student.name);Endvalue= Student._id;             });  returnEndvalue;}

How many lines?

There are a total number of questions and how many pages. Therefore, you need to check how many count.

db.users.find().count();

On the order of Objectid

First look at the objectid generation rule:

Like what"_id" : ObjectId("5b1886f8965c44c78540a4fc")

Takes the first 4 bytes of an ID. Since the ID is 16 binary string,4 bytes are 32 bits, corresponding to the first 8 characters of the ID. That is 5b1886f8 , the conversion to 10 binary is 1528334072 . Plus 1970, is the current time.

In fact, the easier way is to look at the Objectid object in org.mongodb:bson:3.4.3.

 Public ObjectId(Date date) { This(Datetotimestampseconds(date), Machine_identifier, Process_identifier, Next_counter.getandincrement(),false);}//org.bson.types.objectid#datetotimestampsecondsPrivate Static int Datetotimestampseconds(Date time) {return(int) (Time.GetTime()/1000L);}//java.util.date#gettime/*** Returns The number of milliseconds since January 1, 1970, 00:00:00 GMT* Represented by this <tt>Date</tt> object. * * @returnthe number of milliseconds since January 1, 1970, 00:00:00 GMT* represented by this date. */ Public Long GetTime() {return Gettimeimpl();}

The objectid of MongoDB should be increased over time, that is, the ID inserted later is larger than the previous one. But considering the generation rule of ID, the minimum time sorting is the second, and the sort in the same second cannot be guaranteed. Of course, if the same process-generated object is the same machine, it is ordered.

If it is distributed machine, different machine clock synchronization and offset problem. So, if you have a field that is guaranteed to be orderly, it is best to use this field for sorting. _idis the last option.

If I had to jump the page

The above pagination looks ideal, although it is, but there is a just need not to indicate---how I jump page.

Our paging data is associated with the sort key, so you must have a sort datum to truncate the record. and jumping page, I only know the first few pages, the condition is insufficient, cannot be paged out.

Real business requirements do raise the need for page hopping, although almost no one is used, and people are more concerned with the beginning and the end, and the end can be reversed by the scheme to start. Therefore, the need for real paging should not exist. If you are looking for a record, query criteria search is the quickest scenario. If you do not know the query criteria, by the naked eye to one by one view, then the next page is enough.

Said so much, is to reverse the traditional concept of paging, in the development of the Internet today, most of the volume of data is huge, the demand for jumping pages will consume more memory and CPU, corresponding to the query is slow.

Of course, if the number is small, if you do not mind slow, then skip is not a problem, the key depends on the business scenario.

I received the demand today is to jump page, and the number is very small, then skip it, no trouble, also fast.

Let's see what the makers do.

Google is the most commonly used, it seems to have a jump page selection ah. Look again, only 10 pages, more must be the next page, and did not provide a total number of pages, jump to any page of choice. This is not our find-condition-then-limit plan, but he has more than one page, front end or back end of this page cut into 10 copies.

Similarly, looking at Facebook, although it provides the total count, it can only be the next page.

Other scenarios, such as Twitter, Weibo, friends, etc., do not have the concept of jumping pages at all.

Sorting and performance

The previous focus was on the implementation of paging, but the sorting was ignored. Since pagination must be paged in a certain order, it has to be sorted.

MongoDB's sort and find combination

db.bios.find().sort( { name: 1 } ).limit( 5 )db.bios.find().limit( 5 ).sort( { name: 1 } )

Both are equivalent, and the order does not affect the order of execution. That is, the find query matches the result of the condition first, and then sorts in the result set.

Our conditional queries are sometimes sorted by a field, such as by time. To query the data for a set of time series, we want to display the content in chronological order, you must first sort by the time field and then ascending by ID.

db.users.find({name: "Ryan"}).sort( { birth: 1, _id: 1 } ).limit( 5 )

We can achieve our paging function by birth ascending, then birth the same record and then follow the _id ascending order.

Multi-field sorting

db.records.sort({ a:1, b:-1})

Indicates that the first is followed by a ascending, followed by the B descending. That is, according to field a ascending, the same record for a, then B descending, instead of a row after the end of all by B row.

Example:

db.user.find();结果：{     "_id" : ObjectId("5b1886ac965c44c78540a4fb"),     "name" : "a",     "age" : 1.0,     "id" : "1"}{     "_id" : ObjectId("5b1886f8965c44c78540a4fc"),     "name" : "a",     "age" : 2.0,     "id" : "2"}{     "_id" : ObjectId("5b1886fa965c44c78540a4fd"),     "name" : "b",     "age" : 1.0,     "id" : "3"}{     "_id" : ObjectId("5b1886fd965c44c78540a4fe"),     "name" : "b",     "age" : 2.0,     "id" : "4"}{     "_id" : ObjectId("5b1886ff965c44c78540a4ff"),     "name" : "c",     "age" : 10.0,     "id" : "5"}

Ascending by name, and then descending by age

db.user.find({}).sort({name: 1, age: -1})结果：  {     "_id" : ObjectId("5b1886f8965c44c78540a4fc"),     "name" : "a",     "age" : 2.0,     "id" : "2"}{     "_id" : ObjectId("5b1886ac965c44c78540a4fb"),     "name" : "a",     "age" : 1.0,     "id" : "1"}{     "_id" : ObjectId("5b1886fd965c44c78540a4fe"),     "name" : "b",     "age" : 2.0,     "id" : "4"}{     "_id" : ObjectId("5b1886fa965c44c78540a4fd"),     "name" : "b",     "age" : 1.0,     "id" : "3"}{     "_id" : ObjectId("5b1886ff965c44c78540a4ff"),     "name" : "c",     "age" : 10.0,     "id" : "5"}

Optimizing Sorting with Indexes

The performance must be considered here.

$sort and Memory restrictions
The $sort stage has a limit of megabytes of RAM. By default, if the stage is exceeds this limit, $sort would produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary Files. See the Allowdiskuse option in Db.collection.aggregate () method and the aggregate command for details.

Changed in version 2.6:the memory limit for $sort Changed from ten percent of RAM to megabytes of RAM.

Starting with 2.6, sort only sorts the data within 100M, and the error will be exceeded. You can set allowDiskUse to allow bulk data to be sorted.

Indexed sorting is faster than no index, so the official recommendation is to index the key that needs to be sorted.

Index

For single key sorting, set up a separate index

db.records.createIndex( { a: 1 } )

The index can support sort and reverse order

Indexes are sorted in ascending order (1) and Descending (-1), the sort direction of the index definition, and the reversal direction can support sort. For the above single key index A, you can support sort({a:1}) ascending and sort({a:-1}) descending.

For multi-field sorting

If you want to use an index. You can create a composite (compound index) index to

db.records.createIndex( { a: 1, b:-1 } )

The field order of a composite index must be the same as sort

The order of composite multi-field indexes is consistent with the field of sort to be able to walk the index. such as the index {a:1, b:1} , can support sort({a:1, b:1}) and reverse sort({a:-1, b:-1}) , but, does not support A/b upside down. That is, not supported sort({b:1, a:1}) .

Composite index supports sort and reverse order

Indexes {a:1, b:-1} can be supported sort({a:1, b:-1}) , or they can supportsort({a:-1, b:1})

Composite index can support sort with prefix subset

For multi-field composite indexes, you can split into multiple prefix subsets. such as the {a:1, b:1, c:1} equivalent

{ a: 1 }{ a: 1, b: 1 }{ a: 1, b: 1, c: 1 }

Example:

Example	Index Prefix
Db.data.find (). Sort ({a:1})	{a:1}
Db.data.find (). Sort ({A:-1})	{a:1}
Db.data.find (). Sort ({a:1, b:1})	{a:1, b:1}
Db.data.find (). Sort ({A:-1, B:-1})	{a:1, b:1}
Db.data.find (). Sort ({a:1, b:1, c:1})	{a:1, b:1, c:1}
Db.data.find ({A: {$gt: 4}}). Sort ({a:1, b:1})	{a:1, b:1}

A non-prefixed subset of a composite index can support sort, assuming that the elements of a prefix subset are equals in the query condition of find

This condition compares a non-prefixed subset of the composite index, as long as the fields of find and sort are indexed, and the criteria in find must be equal.

Example

Example	Index Prefix
Db.data.find ({a:5}). Sort ({b:1, c:1})	{a:1, b:1, c:1}
Db.data.find ({b:3, a:4}). Sort ({c:1})	{a:1, b:1, c:1}
Db.data.find ({a:5, B: {$lt: 3}}). sort ({b:1})	{a:1, b:1}

The Find and Sort fields add up to a subset of prefixes, and other fields can be used for non-equals comparisons in the Find condition.

is not a subset of prefixes, nor is it a find equality condition. The index is not valid. For example, for an index {a:1, b:1, c:1} . The following two ways do not go index.

db.data.find( { a: { $gt: 2 } } ).sort( { c: 1 } )db.data.find( { c: 5 } ).sort( { c: 1 } )

Java Code Paging

Due to the fact that there is a need to skip pages, there is no performance problems, still use skip for paging, of course, also compatible with the conditions of paging

 PublicPageresult<statbyclientrs>Findbydurationpage(Findbydurationpagerq RQ) {FinalCriteria criteriadefinition = criteria.where("duration"). is(RQ.getduration());FinalQuery query =NewQuery (criteriadefinition). with(New Sort(Lists.newarraylist(New Order(Direction.ASC,"_id"))));LongTotal = Mongotemplate.Count(Query, Statbyclient.class);//Paging logicInteger pageSize = RQ.getpagesize(); Integer pagenum = RQ.Getpagenum(); String LastID = RQ.Getlastid();FinalInteger pages = (int) Math.Ceil(Total/(Double) pageSize);if(pagenum<=0|| pagenum> pages) {pagenum =1; } list<statbyclient> statbyclientlist;if(StringUtils.Isnotblank(LastID)) {if(Pagenum! =1) {criteriadefinition. and("_id").GT(New ObjectId(LastID)); } query.Limit(pageSize); Statbyclientlist = Mongotemplate.Find(Query, Statbyclient.class); }Else{intSkip = PageSize * (Pagenum-1); Query.Skip(Skip).Limit(pageSize); Statbyclientlist = Mongotemplate.Find(Query, Statbyclient.class); } pageresult<statbyclientrs> Pageresult =NewPageresult<> (); Pageresult.Settotal(total); Pageresult.setpages(pages); Pageresult.setpagesize(pageSize); Pageresult.Setpagenum(Pagenum); Pageresult.setlist(Mapper.Maptolistrs(statbyclientlist));returnPageresult;}

In this example, the goal is to query the list according to duration, and the result set is paginated. When the request body lastId is included, go to the next page of the scenario. If you want to skip the page, do not preach lastId , you can jump.

Reference

Official pagination Recommendation
Official Sort Document
Official use Index to refine sort documents
Official composite Index
How to correctly view the demand of paging
Http://ian.wang/35.htm
https://cnodejs.org/topic/559a0bf493cb46f578f0a601

Considerations on the Java implementation and paging requirements of MongoDB paging

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More