MongoDB Finishing note のjava MongoDB paging optimization

Source: Internet
Author: User
Tags mongodb collection uuid google guava

Recent projects in the site user data to do new visitors statistics, data stored in MongoDB, the statistics are not really big, 1000W up and down, but the company only allocated to my 4G memory computer, let me program run up breathless ... It's exhausting.

The most common problem is querying mongodb memory overflow, there is no way to only paged query. This kind of thought everyone may think, but how to page, indeed have more doorway!

The most used online, but also the most common paging is skip+limit this combination, this way to deal with small data can also, but to deal with tens of millions of big data, but only hope and sigh ...

After a variety of online search information, seek the teacher asked, found a speed enough to put Skip+limit combination page out several street method.

IDEA: Conditional query + sort + limit return record. Edge query, Edge sorting, after sorting, extract the first page of the last record, as the second page of the condition, the conditional query, and so on ....

First on the code:

 /*** All access records grouped by UUID that are less than the specified date *@paramDate Specified *@returnmap of all Access Records*/     Public StaticMultimap<string, Map<string, string>>getoldvisitors (String date) {//number of records per query        intPageSize = 100000; //"_id" in MongoDBString objectId = ""; //The return value type of the method, the Google guava used hereMultimap<string, map<string, string>> mapless =NULL; //criteria for the queryBasicdbobject queryless =NewBasicdbobject (), fields =NewBasicdbobject (), field =NewBasicdbobject (); //Initialize the returned MONGODB collection operations object, you can write a data connection poolDbcol =init (); //Query the specified field, the fewer fields, the faster the query, of course, are some unnecessary fieldsField.put ("UUID", 1); Fields.put ("UUID", 1); Fields.put ("Inittime", 1); //conditions less than the specified dateString conditionless =timecond.gettimecondless (date); Queryless.put ("$where", conditionless); Dbcursor cursorless=Dbcol.find (Queryless,field); //MongoDB is less than the specified date, the total size of the collection        intcountless =Cursorless.count (); //number of query traversal circlecountless+1        intCirclecountless = countless/pagesize; //Modulo , this is the number of times the last loop was traversed        intModless = countless%pagesize; //Start traversal Query         for(inti = 1; I <=circleCountless+1; i++) {                        //Document ObjectDBObject obj =NULL; //record the results returned in the cursor to the list collection, why put it in the list collection? This is for the back guava group to preparelist<map<string, string>> listofmaps =NewArrayList (); //if the condition is not empty, then add this condition to form a multi-conditional query, which is the key to paging            if(!"". Equals (ObjectId)) {                                  //We return the Document Object Obj.get ("_id") without Objectid (), so this step is requiredObjectId ID =NewObjectId (ObjectId); Queryless.append ("_id",NewBasicdbobject ("$gt", id)); }                        if(i<circlecountless+1) {cursorless= Dbcol.find (queryless,fields). Sort (NewBasicdbobject ("_id", 1) . Limit (pagesize); }Else if(i==circlecountless+1) {//last Loopcursorless=Dbcol.find (queryless,fields). Limit (modless); }                                //record the results returned in the cursor to the list collection, why put it in the list collection? This is for the back guava group to prepare                 while(Cursorless.hasnext ()) {obj=Cursorless.next (); Listofmaps.add (Map<string, string>) (obj); }                //gets the "_id" of the last record in a page, and then passes in the next loop as a condition                if(NULL!=obj) {ObjectId= Obj.get ("_id"). toString (); }            //first grouping, grouping by UUID, grouping historical data except todayMapless =Multimaps.index (Listofmaps,NewFunction<map<string, String>, string>() {                           PublicString Apply (FinalMap<string, string>From ) {                                                               returnFrom.get ("UUID");                        }                 }); }                returnmapless; }
View Code

Why do you use the "_id" field as a condition for paging? In fact, I also used other fields, such as Time field, time string can also be comparable size, but its efficiency is far less than "_id" high.

About MongoDB "_id", has been ignored its role, the direct result is let me spend a lot of time and energy, around the majority circle, and back to the original point, there is a kind of people find him 1100 degrees, suddenly look back, that person is in the lights dim feeling ...

MongoDB ObjectId

The 24-bit string "4e7020cb7cac81af7136236b", although it looks very long and difficult to understand, is actually composed of a set of hexadecimal characters, a hexadecimal number of two bits per byte, with a total of 12 bytes of storage space. MongoDB does have a lot more bytes than the Mysqlint type of 4 bytes. However, depending on the current storage device, the extra bytes should not be a bottleneck. However, this design of MongoDB embodies the idea of space-changing time. The official website of the objectid of the code,:

1) Time

Time stamp. The first 4 bits of the objectid that you just generated are extracted "4E7020CB" and then hexadecimal to decimal to "1315971275", which is a timestamp. Through the conversion of timestamps, it becomes easy to see the time format.

2) machine

Machine. The next three bytes are "7cac81", the three bytes is the unique identifier of the host, which is generally the hash value of the machine hostname, so as to ensure that different hosts generate different machine hash values, to ensure that there is no conflict in the distribution, This is why the strings in the middle of the objectid generated by the same machine are identical.

3) PID

The process ID. The machine above is to ensure that the objectid generated in different machines do not conflict, and the PID is to be in the same machine different MongoDB process generated objectid conflict, the next "af71" two bits is the process identifier that produces objectid.

4) INC

Self-increment counter. The previous nine bytes are guaranteed to be different machines in a second different process generation Objectid does not conflict, the following three bytes "36236b" is an automatically added counter, to ensure that the same second generation of Objectid will not find a conflict, Allows 256 of 3 to be equal to the uniqueness of 16,777,216 Records.

In total, the first 4 bytes timestamp of the Objectid, the time the document was created, the next 3 bytes representing the unique identifier of the host, and the difference between the different hosts was determined to produce different objectid; the next 2-byte process ID determines the Different MongoDB processes produce different objectid; Finally, a 3-byte self-increment counter ensures the uniqueness of the Objectid in the same second. Objectid's primary key generation strategy solves the uniqueness problem of high concurrency in distributed environment, which is worth learning.

MongoDB Finishing note のjava MongoDB paging optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.