Java MongoDB Paging optimization

Last Update:2014-05-26 Source: Internet

Author: User

Tags mongodb collection uuid

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recent projects in the site user data to do new visitors statistics, data stored in MongoDB, the statistics are not really big, 1000W up and down, but the company only allocated to my 4G memory computer, let me program run up breathless ... It's exhausting.

The most common problem is querying MongoDB memory Overflow , there is no way to only paged query. This kind of thought everyone may think, but how to page, indeed have more doorway!

The most used online, but also the most common paging is skip+limit this combination, this way to deal with small data can also, but to deal with tens of millions of big data, but only hope and sigh ...

After a variety of online search information, seek the teacher asked, found a speed enough to put Skip+limit combination page out several street method.

Idea: Conditional query + sort + limit return record. Edge query, Edge sorting, after sorting, extract the first page of the last record, as the second page of the condition, the conditional query, and so on ....

first on the code:

  /** * All access records grouped by UUID, less than the specified date * @param specified date * @return All Access records for Map */public static multimap<string, map<string, Strin G>> getoldvisitors (string date) {//per query record count int pagesize = 100000;//mongodb "_id" String objectId = "";//method return value type, Google guavamultimap<string, map<string, string>> mapless = null;//query conditions basicdbobject queryless = new Basicdbobject (), fields = new Basicdbobject (), field = new Basicdbobject ();//Initialize the returned MongoDB collection operation object, you can write a data connection pool Dbcol = init ();//Query the specified field, the fewer fields, the faster the query, of course, are some unnecessary fields field.put ("uuid", 1); Fields.put ("uuid", 1), Fields.put ("Inittime", 1);// Conditions less than the specified date string conditionless = Timecond.gettimecondless (date), Queryless.put ("$where", conditionless);D bcursor cursorless = Dbcol.find (Queryless,field);//mongodb is less than the specified date, the total size of the collection int countless = Cursorless.count ();//number of query traversal Circlecountless+1int circlecountless = countless/pagesize;//modulo, this is the number of times the last loop traversal int modless = countless%pagesize;// Start traversal query for (int i = 1; I <=circleCountless+1; i++) {//Document object DBObject obj = null;//returns the cursorResults are recorded in the list collection, why are they placed in the list collection? This is for the later guava group to prepare list<map<string, string>> listofmaps = new ArrayList ();//If the condition is not NULL, then add this condition, which makes up the multi-condition query, This step is the key to paging if (! "". Equals (ObjectId)) {//We return with the Document Object Obj.get ("_id") without objectId (), so this step is required objectId I       D = new ObjectId (ObjectId); Queryless.append ("_id", New Basicdbobject ("$GT", id));} if (i<circlecountless+1) {cursorless = Dbcol.find (queryless,fields). Sort (New Basicdbobject ("_id", 1)). Limit ( pagesize);}            else if (i==circlecountless+1) {//last loop cursorless = Dbcol.find (queryless,fields). Limit (modless);} Record the results returned in the cursor to the list collection, why put it in the list collection? This is prepared for the subsequent Guava grouping while (Cursorless.hasnext ()) {obj = Cursorless.next (); Listofmaps.add ((map<string, string>) o    BJ);}                 Gets the "_id" of the last record in the paging, and then passes as a condition to the next loop if (null!=obj) {objectId = Obj.get ("_id"). ToString (); }//First grouping, grouping according to UUID, grouping historical data except today Mapless = Multimaps.index (listofmaps,new function<map<string, String>, string> () {public String apply (final map<string, string> from) {                  return From.get ("UUID");    }             }); }return mapless;}

I am a newcomer, write the code is not so elegant and difficult to understand, and the comments written by the self think it is relatively clear, if you have questions, or there is more elegant way to comment ...

Why do you use the "_id" field as a condition for paging? In fact, I also used other fields, such as Time field, time string can also be comparable size, but its efficiency is far less than "_id" high.

About MongoDB "_id", has been ignored its role, the direct result is let me spend a lot of time and energy, around the majority circle, and back to the original point, there is a kind of people find him 1100 degrees, suddenly look back, that person is in the lights dim feeling ...

MongoDB ObjectId

The 24-bit string "4e7020cb7cac81af7136236b", although it looks very long and difficult to understand, is actually composed of a set of hexadecimal characters, a hexadecimal number of two bits per byte, with a total of 12 bytes of storage space. MongoDB does have a lot more bytes than the Mysqlint type of 4 bytes. However, depending on the current storage device, the extra bytes should not be a bottleneck. However, this design of MongoDB embodies the idea of space-changing time. The official website of the objectid of the code,:

1) Time

Time stamp. The first 4 bits of the objectid that you just generated are extracted "4E7020CB" and then hexadecimal to decimal to "1315971275", which is a timestamp. Through the conversion of timestamps, it becomes easy to see the time format.

2) machine

Machine. The next three bytes are "7cac81", the three bytes is the unique identifier of the host, which is generally the hash value of the machine hostname, so as to ensure that different hosts generate different machine hash values, to ensure that there is no conflict in the distribution, This is why the strings in the middle of the objectid generated by the same machine are identical.

3) PID

The process ID. The machine above is to ensure that the objectid generated in different machines do not conflict, and the PID is to be in the same machine different MongoDB process generated objectid conflict, the next "af71" two bits is the process identifier that produces objectid.

4) INC

Self-increment counter. The previous nine bytes are guaranteed to be different machines in a second different process generation Objectid does not conflict, the following three bytes "36236b" is an automatically added counter, to ensure that the same second generation of Objectid will not find a conflict, Allows 256 of 3 to be equal to the uniqueness of 16,777,216 Records.

In total, the first 4 bytes timestamp of the Objectid, the time the document was created, the next 3 bytes representing the unique identifier of the host, and the difference between the different hosts was determined to produce different objectid; the next 2-byte process ID determines the Different MongoDB processes produce different objectid; Finally, a 3-byte self-increment counter ensures the uniqueness of the Objectid in the same second. Objectid's primary key generation strategy solves the uniqueness problem of high concurrency in distributed environment, which is worth learning.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More