At the beginning of September, rhttp://www.aliyun.com/zixun/aggregation/13461.html ">mongodb officially released the revised version, which means The language of the numerical calculation can also be in line with the NoSQL product, but in view of my side does not have the company really to use the Union of R and MongoDB, so in the efficiency question, we also dare not take lightly, therefore did one such test.
The test environment is 8 cores, 64-bit machines. The library used for testing is a collection without sharding, about 30G. Used to store data such as user preferences, tag information, and so on.
Library (RMONGODB) mongo <- mongo.create () if (mongo.is.connected MONGO ) { ns <- ' rivendell.user ' print (' Query for a field without an index, query one ') Print (System.time (P <- mongo.find.one (Mongo,ns,list (friend=600))) Print (' query for an indexed field, multiple, Without buffer ') print (System.time p <- Mongo.find (Mongo,ns,list (friend=600))) print (' Check for Cache policy ') print (system.time (P <- mongo.find (Mongo,ns,list (friend=600))) Print (' query for an index-less field, multiple, Has buffer ') buf <- Mongo.bson.buffer.create () mongo.bson.buffer.append (buf, ' Friend ', 600L) Query <- mongo.bson.from.buffer (BUF) print (System.time p <- mongo.find ( mongo,ns,query)) print (' SeeIs there a caching strategy ') buf <- mongo.bson.buffer.create () Mongo.bson.buffer.append (buf, ' Friend ', 600L) Query <- mongo.bson.from.buffer (BUF) Print (System.time p <- mongo.find (mongo,ns,query)) Print (' Greater than query, query a record ') print (System.time p <- mongo.find.one (mongo,ns,list) (Friend =list (' $gt ' =600l))) print (' greater than record, query multiple records ') print (System.time cursor <- mongo.find (mongo,ns,list friend=list (' $gt ' =600l))) Mongo.cursor.destroy (cursor) Print (' query for an indexed record ') print (System.time p <- Mongo.find.one (mongo,ns,list (' _id ' =3831809l))) print (' query indexed Records ') print ( System.time (P <- mongo.find (mongo,ns,list (' _id ' =3831809l))) print (' Insert a record ') &NBSP;&NBSp; Buf <- mongo.bson.buffer.create () mongo.bson.buffer.append (buf, ' name ', " Huangxin ") mongo.bson.buffer.append (buf, ' age ', 22L) p <- Mongo.bson.from.buffer (BUF) print (System.time (Mongo.insert (mongo,ns,p)) Print (' Find the record just inserted ') print (System.time p <- mongo.find.one (mongo,ns,list) (' Name ' = ' huangxin ')) if (!is.null (p)) { print (' Success ') } print (' BULK insert ') buf <- Mongo.bson.buffer.create () mongo.bson.buffer.append (buf, ' name ', ' Huangxin ') Mongo.bson.buffer.append (buf, ' age ', 22L) P1 <- mongo.bson.from.buffer (BUF) buf <- mongo.bson.buffer.create () Mongo.bson.buffer.append (buf, ' name ', ' Huangxin ') mongo.bson.buffer.append (buf, ' age ', 22L) P2 <- mongo.bson.from.buffer (BUF) buf <- Mongo.bson.buffer.create () mongo.bson.buffer.append (buf, ' name ', ' Huangxin ') Mongo.bson.buffer.append (buf, ' age ', 22L) P3 <- mongo.bson.from.buffer (BUF) Print (System.time (Mongo.insert.batch (Mongo,ns,list (P1,P2,P3))) Print (' Find the record just in bulk ') print (System.time (cursor <- mongo.find) (mongo,ns,list (' name ' = ') Huangxin ')) i <- 0 while (Mongo.cursor.next (cursor)) { i <- i + 1 } Print (i) print (' batch update ') print (systeM.time (Mongo.update (mongo,ns,list (name= ' huangxin '), List (' name ' = ' Kym '))) print (' See if the update was successful ') print (System.time (p <- mongo.find.one mongo,ns,list (' name ' = ' Kym ')) if (!is.null (p)) { print (' success ') } print (' bulk deletion ') print (System.time mongo.remove (mongo,ns,list) ( Name= ' Kym '))  } Print (System.time p <- mongo.find.one ( ' Name ' = ' Kym ')) if (!is.null (p)) { print (' Success ') }
[1] "Query a field without an index, query for a" user system elapsed 0.000 0.000 0.115 [1] "Query for a field without an index, multiple, without buffer" user system elapsed 0.000 0.000 32.513 [1] "See if there is a caching strategy" user system elapsed 0.000 0.000 32.528 [1] "Query for a field without an index, multiple, has buffer" user system elapsed 0.000 0.000 32.685 [1] "See if there is a cache policy" user system elapsed 0.000 0.000 33.172 [1] "is greater than the query, Query a record " user system elapsed 0.000 0.000 0.001 [1] " is greater than the record, query multiple records " user system elapsed 0.000 0.000 0.014 [1] "Querying an indexed record" user system elapsed 0 0 0 [1] "query indexed records" user system elapsed 0 0 0 [1] "Insert a record" user system elapsed 0 0 0 [1] " Find just insertedRecord " user system elapsed 0.00 0.00 35.42 [1] " Success " [1] " Bulk Insert " user system elapsed 0 0 0 [1] " find records that have just been inserted in bulk user system elapsed 0.004 0.000 35.934 [1] 7 [1] "batch Update" user system elapsed 0.000 0.004 0.000 [1] "View Update Success" user system Elapsed 0.000 0.000 67.773 [1] "Success" [1] "Bulk deletion" user system elapsed 0 0 0 user system elapsed 0.000 0.000 91.396
What I have not understood before is why greater than and equal to, the gap will be so much worse. Later, when I was using Python to do the same test, I found that Python's efficiency is the same, so this proves that this is not a mongodb problem, and I do not believe that at the database level, the driver of a language will have so much difference.
Later I discovered a difference between Python and r about MongoDB driver. First of all, Python find is not to pull the query to the whole of the data set back, but to return a cursor, that is, he executed the find command does not consume time, and if you add while Cursor.next (), will actually execute the query.
But R is not the same, R will first consider the size of the dataset (or otherwise), and then return cursor as the case may be, or pull the whole dataset back. If we calculate the previous while Mongo.cursor.next (cursor), then we will find that the efficiency difference is not obvious in the operation greater than and equal to.
In practice, BULK Insert is a very common application scenario, but for R or Matlab language, the efficiency of the cycle has been a mishap, so next, I will try to use the Apply series to solve the cycle of R language, if the actual operation found feasible, Then it is worth trying to use the parallel Computing library of Mutilab and so on to give full play to multi-core efficiency.
Original link: http://www.cnblogs.com/kym/archive/2011/09/26/2191501.html