There are a lot of blog posts about how to take a single record out of a MongoDB library, including the following three ways:
1. Skip a random number of records.
dbcursor cursor = coll.find (query), int rint = Random.nextint (Cursor.count ()), Cursor.skip (rint);D bobject word = null;if ( Cursor.hasnext ()) {word = Cursor.next (); Cursor.close ();}
Many people say that this is not recommended, if the amount of data is not so large, 100,000 records can be fully used within this method.
In fact, let me write the storage engine, I must make this method the fastest. Query only, not sort. The criteria used as a query are indexed, at least as fast as Method 2. Physical skip by index makes it easy to navigate directly to the data you are looking for.
2. Add a random numeric field.
var random=math.random (); var result=db.user.findone ({"random": {"$lt": Random}}); if (result==null) { result= Db.user.findOne ({"random": {"$gte": Random}});
Many people say that this is a more efficient way, but if you believe, and do not add validation, it is really called the pit father! Let's analyze it below.
The findone mechanism is to remove the first of the query results. Is there anything wrong with that?
Are the internal query results sorted by the random field? The question is obviously not that good answer! The experiment found that if the index is not indexed, the results of the query are sorted in order of storage. That is, regardless of whether the query condition is less than or greater than or equal, it will be taken to the earliest recorded record that meets the criteria. This greatly reduces the randomness, there is no AH. If the index is indexed, the results of the query are sorted by random from small to large, at which point the lowest value of the random value is always retrieved if the less than equals condition is used. There is no randomness, is it not? That is, if the random ascending order needs to be larger than the condition, if descending requires using less than the condition. But do you know how to sort the interior without specifying the sort, are you sure? If you are not sure, you have to specify the sort method. From the bottom up, you specify the sort method that is consistent with the physical order of the indexes to achieve maximum efficiency.
3. Add a random space position field.
Db.coll.ensureIndex ({ random: ' 2d '}) result = Db.coll.findOne ({random: {$near: [Math.random (), 0]}})
Create the random as a multivalued field, two worth it, and when you build the index, use it as a location. The value is randomly generated by a coordinate, and then the nearest value from this point is taken.
Many people put this method in the recommended method, but is it really a high-efficiency method? This method is more troublesome, many people should not verify, directly copied to their own blog. In fact, with your head, this method is more unreliable than the second way of thinking. You can think of an algorithm yourself--find a coordinate closest to a coordinate in a group of coordinates. Oh, it's hard to sit on the punctuation. Well, they don't have a size relationship, which means there's no natural order. Those awesome search algorithms are sort-based (hash-based nature is also sort-based).
So what? Method One: Establish two indexes on two components. Then we construct a smaller rectangular region with the query point as the center point, and the query is the relation of the range of two components. If the result is not found, expand the rectangular area, if the query results too much, narrow the rectangular area, here you can use dichotomy. The distance is then calculated precisely within the rectangular area. Method Two: Spatial points are clustered according to distance, then store the category center, then use method one to find the nearest category center, and then using method one within this. Method Three: Find out all the points of the external rectangle, and then divide the area into 4 blocks of the method to build a 4-fork tree index. The construction process is to divide a region into four equal parts, and then divide one of them into 4 halves until the number of points in each region is less than the given value. This determines the scope of each layer when querying. The last layer of violence calculates the distance. I can't think of a more cow B Method!
The most recent space distance calculation is not easy!
MongoDB random Query A record of the correct method!