"Chat" Set up Database index by MONGODB __ Database

Source: Internet
Author: User
Tags bulk insert local time mongodb set time

Recently, the project with the teacher, the use of two sets: Air quality PM and meteorological data meteo,pm contains 15 million data, Meteo contains 3 million data.

The goal is to find out all the data that the time attribute can correspond to.

For example: PM and Meteo have data for 2013-12-01 12:00:00, where PM has 920 air quality data at this time (because there are many different air quality monitoring sites), Meteo has 350 of this time meteorological data (because there are many different weather stations) , keep or deposit the data in their new collection.

If in 2013-12-02 12:00:00 this moment, PM has this moment of air quality data, but there is no weather data of this day in the Meteo, then the PM in this moment of data is useless. Because we request: for the same time, both sides of the data are indispensable.

Then start using the MongoDB operation, the idea is simpler:

(1) Set the time format to facilitate the subsequent conversion of string strings to date objects.

Set time format and timezone
SimpleDateFormat df = new SimpleDateFormat (("Yyyy-mm-dd HH:mm:ss");

Df.setcalendar (New GregorianCalendar (new SimpleTimeZone (0, "GMT"));

Tips:mongodb the time Date object into the database, automatically stored as GMT standard time, it defaults to your original time data is your local time (I am sorry I do not know what it is based on what to identify, according to the server address or what), anyway, will automatically recognize your local time zone , and then converted to GMT time. For example, we are all in China: your original. txt file in the time is 2013-12-01 10:00:00, when you directly use Java into the MongoDB database, it will automatically subtract 8 hours, into 2013-12-01 02:00:00.

But my raw data was originally given to me at GMT time, and I also wanted to store them in GMT , not in Beijing. So if I subtract 8 hours from GMT, my data doesn't know what time it becomes. So be sure to set good timezone oh.

(2) Set start and end moments StartDate and EndDate:

Date StartDate = Df.parse ("2013-12-01 00:00:00");
Date EndDate = Df.parse ("2015-12-31 23:00:00");

The time in the Tips:mongodb is best stored in the date () object, and when our raw data is processed, the time is stored to the MongoDB using the JavaScript date () object. In addition, my name here although is xxxdate, in fact, time is accurate to the moment of OH.

(3) Set the loop variable eachdate:

Calendar eachdate = Calendar.getinstance ();
Eachdate.settime (StartDate);

The Calendar class is used here because the Calendar class provides an add (int filed, int amount) method that automatically increases the year, month, days, and hours.

For example, 2013-12-31 use the Calendar Add () method plus one day, automatically will become 2014-01-01, will not become 2013-12-32. Plus an hour. Likewise, adding an hour on the basis of the 2013-12-31 23:00:00 will turn into 2014-01-01 00:00:00, which is very convenient.

(4) Cycle of start date:

Using while, when StartDate is always in front of the EndDate, do a detailed lookup match (detailed operations are not discussed here, discussed below).

while (Eachdate.gettime (). Before (EndDate)) {
    Eachdate.add (calendar.hour, 1);//per hour increment cycle
    date = Eachdate.gettime ();

    Detailed matching operation is discussed in detail below, and the above two sentences only indicate how to do the time cycle
    //believe that if you know what my problem is, I will have my own ideas, I hope to discuss with you, younger brother is still a novice.
}

Tips:eachdate is a calendar object, so call its gettime () method, return a Date object, and then use the Date object's before () or after () method to determine the loop.

(5) It's important to start talking about the statements in the while.

My initial idea was very simple, set up the eachdate loop, directly using Eachdate to query all documents that meet this time in the PM and Meteo two tables respectively.

/* Set MONGODB cursor cursor, must * *
basicdbobject queryobject = new Basicdbobject ();
Queryobject.put ("Time", date);
Dbcursor Querycursormeteo = Dbmeteocollection.find (queryobject);
Dbcursor querycursorpm = Dbpmcollection.find (queryobject);

* If all contain this day, then insert the corresponding document into the respective new table/
if ((Querycursormeteo.hasnext ()) && (Querycursorpm.hasnext ())) {    // To determine whether this is the day, and if so, continue the following while
    (Querycursormeteo.hasnext ()) {
        Dbmeteosametimecollection.insert ( Querycursormeteo.next ());      Meteo_data_same_time Collection in Meteo Library while

    (Querycursorpm.hasnext ()) {
        Dbpmsametimecollection.insert (Querycursorpm.next ());            Pm_data_same_time Collection} in Pmdata library

The idea of pure into as I, directly according to Eachdate to two sets of check, if all found (that is, the above if judgment), then use while to all the documents found in the new collection. Did not do any processing, no index, no sort, no bulk inserts, no, nothing, baby heart tired so nothing.

The price of such a simple idea is that it costs a lot of time. Facing two sets of 1500w and 300w, this code ran my computer for 14 hours. The baby is so stupid.

(6) OK, the baby wants to start optimizing the code.

Two ideas:

1, on the basis of (5), increase the index for time, the price of the index is to insert in the original, the collection will be more time, but I do not insert in the original collection, but inserted into the new collection. So my processing of the original set is just a query, and the index can save time greatly, and then insert into the new collection without using the while one insert, but use the bulk INSERT.

2, the original two sets are sorted by time, set two variables indexpm and I point to PM and Meteo respectively, and the outer loop uses the pointer I of the meteo with the smaller amount of data. Compare PM Time and Meteo time equal, if equal, PM pointer moves back once. Pseudo code is as follows:

To sort two sets of data first, you can use the MongoDB sort command
int indexpm = 0;

for (int i = 0; i < datelength i + +) {while
  (date in indexpm of collection PM = = Date at I of collection climate) { C3/>INDEXPM + +; Is the cursor of the PM data forward one 
  } while
  (date at indexpm of collection pm!= date at i+1 of collection climate) {
    Delete c Urrent data of PM;
    INDEXPM + +;
  }
}
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.