MongoDB Paging optimization

Last Update:2015-07-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MongoDB paging is very simple, this article mainly talk about the problems of paging, as well as the optimization scheme

From the traditional web to the mobile API, we all face the same problem, such as Ajax get size display, etc., will force you to paging

For example, my project uses Ratchet to do H5 framework, its push.js is Ajax get loading other pages, the page is too large to error.

Pagination description

In a typical list API: The drop-down refresh is to get the latest information, then pull up to load the next page

2 interfaces to write for common APIs

Get_latest (Model,count)
Get_with_page (Number,size)

Get_latest generally take the latest data, such as our common drop-down refresh, which is generally the interface. Because of the 2 drop-down, there may be very long intervals, so the data taken out will flush the data from the current list.

Usual procedure

If n (for example, n=30s) has a continuous request within minutes, the hint has recently been updated, there is no need to brush it, or return the current data directly
If new data is taken, the data of the current list is flushed out to ensure data consistency

If you judge me to the last page,

The common approach is to take out the totals, divided by pagesize, and then determine whether the current page and total pages-1

n = all_count - 1

Less time, no feeling, if the volume is large, you go to check the count (*) What is the consequences?

So good practice is to follow the ID to check, the front end based on the number of data returned each time, if the number equals pagesize, you can take off a page of data, on the contrary, if the data is less than pagesize, you know that there is not so much data can be taken, that is, to the end. At this point, just disable get the button for the next page.

Using the Skip () and limit () implementations

//Page 1db.users.find().limit (10)//Page 2db.users.find().skip(10).limit(10)//Page 3db.users.find().skip(20).limit(10)........

Abstraction is: Retrieving page n is the code that should look like this

db.users.find().skip(pagesize*(n-1)).limit(pagesize)

Of course, this is assuming that there is no data insertion or deletion between your 2 queries, does your system work?

Of course most OLTP systems cannot be determined not to update, so skip is just a toy, not much use

and skip+limit only suitable for a small amount of data, the data more than a card, even if you add index, optimization, its flaws are so obvious.

If you are dealing with a large number of datasets, you need to consider other scenarios.

Using Find () and limit () implementations

Before using the Skip () method has no way to better handle large-scale data, so we have to find a skip alternative.

To do this we want to balance the query, considering the time stamp or ID in the document

In this case, we'll handle it with ' _id ' (like the timestamp, see if you have a field like created_at in your design).

' _id ' is a mongodb ObjectID type, ObjectID uses 12 bytes of storage space, two hexadecimal digits per byte, is a 24-bit string, including timestamp, machined, ProcessID, counter, etc. Here's a section on how it's made, and why it's unique.

The general idea of using _id for paging is as follows

In the current page to find out the last 1 records of the _id, recorded as last_id
Put down the last_id, as the query conditions, to find more than last_id records as the next page of content

So, isn't it easy?

The code is as follows

//Page 1db.users.find().limit(pageSize);//Find the id of the last document in this pagelast_id = ...//Page 2users = db.users.find({‘_id‘> last_id}). limit(10);//Update the last id with the id of the last document in this pagelast_id = ...

This is just the demo code, let's take a look at how to write in the Robomongo 0.8.4 client

db.usermodels.find({‘_id‘ :{ "$gt" :ObjectId("55940ae59c39572851075bfd")} }).limit(20).sort({_id:-1})

According to the above interface, we still need to implement 2 interfaces

Get_latest (Model,count)
GET_NEXT_PAGE_WITH_LAST_ID (last_id, size)

In order to get a better understanding of the ' _id ' paging principle, it is necessary to understand the composition of Objectid.

About Objectid composition

Previously said: ' _id ' is the type of MongoDB Objectid, it consists of 12-bit structure, including timestamp, machined, ProcessID, counter and so on.

! [] (Http://images.blogjava.net/blogjava_net/dongbule/46046/o_111.PNG)

TimeStamp

The first 4 bits is a UNIX timestamp, is an int category, we will extract "4DF2DCEC" from the first 4 bits of objectid in the above example, and then install them hex specifically for decimal: "1307761900", this number is a timestamp, To make the effect better, we convert this timestamp into the time format we used to

$ Date-d ' 1970-01-01 UTC 1307761900 sec '-u
Saturday, June 11, 2011 03:11:40 UTC

The first 4 bytes actually hide the time the document was created, and the timestamp is at the top of the character, which means that objectid is roughly sorted by insert, which can be useful in some ways, such as improving search efficiency as an index, and so on. Another benefit of using timestamps is that some client-side drivers can parse out when the record was inserted by Objectid, which also answers the fact that when we create multiple Objectid in a fast succession, we find that the first few numbers rarely find a change in reality, because the current time is used, Many users worry about the time synchronization of the server, in fact, the true value of this timestamp is not important, as long as it always increases the good.

Machine

The next three bytes, that is, the 2cdcd2, the three bytes is the unique identifier of the host, usually the machine hostname hash value, so that the different host generation different machine hash value, to ensure that there is no conflict in the distribution, This is why the strings in the middle of the objectid generated by the same machine are identical.

Pid

The machine above is to ensure that the objectid generated in different machines do not conflict, and the PID is to the same machine different MongoDB process generated Objectid does not conflict, the next 9,362 bits is the process identifier that produces objectid.

Increment

The previous nine bytes are guaranteed to be different machines in a second different process generation Objectid does not conflict, the following three bytes a8b817, is an auto-incremented counter, to ensure that in the same second generated objectid will not find a conflict, Allows 256 of 3 to be equal to the uniqueness of 16,777,216 Records.

Client generation

MongoDB Generation Objectid There is also a greater advantage, is that MongoDB can be generated through its own services Objectid, can also be generated through the client driver, if you look closely at the document you will sigh, MongoDB design everywhere to make

With the idea of space for time, compare Objectid is lightweight, but the service side generates also must spend time, so can transfer from the server to the client driver to complete the transfer as far as possible, the transaction must be thrown to the client to complete, reduce the cost of the service side, Another reason is that scaling the application layer is much more variable than extending the database layer.

Summarize

MongoDB's Obejctid production thought in many aspects is worthy of our reference, especially in large-scale distributed development, how to build a lightweight production, how to transfer the production load, how to exchange space for time to improve the maximum optimization of production and so on.

The purpose of saying so much is to tell you: the _id of MongoDB is the only one, how the single machine, how the unique cluster, understand this can be.

Performance optimizationIndex

Follow your business needs, see official documentation http://docs.mongodb.org/manual/core/indexes/

About explain

The implementation plan in the RDBMS, if you do not understand, then MONGO explain estimate you are not very familiar with, simply say a few words

Explain is a command that MongoDB provides to view the process of a query for performance tuning.

http://docs.mongodb.org/manual/reference/method/cursor.explain/

Db.usermodels.find ({' _id ': {"$gt": ObjectId ("55940AE59C39572851075BFD")}}). Explain ()/* 0 */{"Queryplanner": { "Plannerversion": 1, "namespace": "Xbm-wechat-api.usermodels", "Indexfilterset": false, "PARSEDQ Uery ": {" _id ": {" $gt ": ObjectId (" 55940AE59C39572851075BFD ")}},"                Winningplan ": {" stage ":" FETCH "," Inputstage ": {" stage ":" IXSCAN ", "Keypattern": {"_id": 1}, "IndexName": "_id_", "is                         Multikey ": false," direction ":" Forward "," Indexbounds ": {" _id ": [                 (ObjectId (' 55940AE59C39572851075BFD '), ObjectId (' ffffffffffffffffffffffff ')] " }}}, "Rejectedplans": []}, "Executionstats": {"executionsucces S ": true," NretuRned ": 5," Executiontimemillis ": 0," totalkeysexamined ": 5," totaldocsexamined ": 5," Execu            Tionstages ": {" stage ":" FETCH "," nreturned ": 5," Executiontimemillisestimate ": 0, "Works": 6, "Advanced": 5, "Needtime": 0, "Needfetch": 0, "savest  Ate ": 0," restorestate ": 0," IsEOF ": 1," invalidates ": 0," docsexamined ": 5, "Alreadyhasobj": 0, "Inputstage": {"stage": "IXSCAN", "Nreturn                Ed ": 5," Executiontimemillisestimate ": 0," Works ": 5," Advanced ": 5,  "Needtime": 0, "Needfetch": 0, "saveState": 0, "restorestate":  0, "IsEOF": 1, "invalidates": 0, "Keypattern": {"_id"  : 1},              "IndexName": "_id_", "Ismultikey": false, "direction": "Forward", "Indexbounds": {"_id": ["(ObjectId (' 55940AE59C39572851075BFD '), objec                TId (' ffffffffffffffffffffffff ')] "]}," keysexamined ": 5, "dupstested": 0, "dupsdropped": 0, "seeninvalidated": 0, "matchtested": 0}}, "Allplansexecution": []}, "ServerInfo": {"host": "Iz251uvtr2b", " Port ": 27017," version ":" 3.0.3 "," gitversion ":" B40106b36eecd1b4407eb1ad1af6bc60593c6105 "}}

Field Description:

QueryPlanner.winningPlan.inputStage.stage Column Display query policy

Ixscan means using the index query
Collscan means using a column query, which is a comparison of the past

The index name in the cursor is moved to the QueryPlanner.winningPlan.inputStage.indexName

3.0 uses executionstats.totaldocsexamined to show the total number of documents that need to be checked to replace the nscanned in 2.6, which is the number of rows scanned for document.

nreturned: Number of document rows returned
Needtime: Time-consuming (milliseconds)
Indexbounds: The index used

Profiling

There's also a profiling feature

db.setProfilingLevel(2, 20)

There are three levels of profile:

0: Do not open
1: Record slow command, default is greater than 100ms
2: Record all commands
3, query profiling record

Default record in System.profile

db[‘system.profile‘].find()

Summarize

Explain can do performance analysis during the code-writing phase, and the development phase
Profile detection of slow performance statements, easy to locate online product issues

No matter what kind of problem you're targeting, the solution

Adapt the schema structure to your business
Optimizing indexes

With this knowledge, I believe you can test the performance of the paging statement on your own.

End of full text

Welcome to my public number "node full stack"

MongoDB Paging optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MongoDB Paging optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MongoDB Paging optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support