The misunderstanding of Objectid in MongoDB and a series of problems caused by it

Source: Internet
Author: User
Tags mongodb driver mongodb query

The recent transformation of two applications, a series of problems in the online process (part of which is due to objectid misunderstanding)

first to understand the following objectid:

TimeStampThe first 4 bits is a UNIX timestamp, is an int category, we will extract "4DF2DCEC" from the first 4 bits of objectid in the above example, and then install them hex specifically for decimal: "1307761900", this number is a timestamp, To make it better and more obvious, we convert this timestamp into the time format we're accustomed to ( accurate to seconds)
$ Date-d ' 1970-01-01 UTC 1307761900 sec '-u
Saturday, June 11, 2011 03:11:40 UTC
The first 4 bytes actually hide the time the document was created, and the timestamp is at the top of the character, which means that objectid is roughly sorted by insert, which can be useful in some ways, such as improving search efficiency as an index, and so on. Another benefit of using timestamps is that some client-side drivers can parse out when the record was inserted by Objectid, which also answers the fact that when we create multiple Objectid in a fast succession, we find that the first few numbers rarely find a change in reality, because the current time is used, Many users worry about the time synchronization of the server, in fact, the true value of this timestamp is not important, as long as it always increases the good.
MachineThe next three bytes, that is, the 2cdcd2, the three bytes is the unique identifier of the host, usually the machine hostname hash value, so that the different host generation different machine hash value, to ensure that there is no conflict in the distribution, This is why the strings in the middle of the objectid generated by the same machine are identical.

PIDThe machine above is to ensure that the objectid generated in different machines do not conflict, and the PID is to the same machine different MongoDB process generated Objectid does not conflict, the next 9,362 bits is the process identifier that produces objectid.

IncrementThe previous nine bytes are guaranteed to be different machines in a second different process generation Objectid does not conflict, the following three bytes a8b817, is an auto-incremented counter, to ensure that in the same second generated objectid will not find a conflict, Allows 256 of 3 to be equal to the uniqueness of 16,777,216 Records.
Objectid uniqueness You may feel that, to some extent, it is guaranteed to be unique, both on the client side and on the server. Myth One, document order and insert order consistent? Single-threaded case Objectid in timestamp, machine, PID, Inc are guaranteed to be unique because the same process is in the same machines. Here's a problem with MongoDB's operation when multithreaded. A, B, c ... When several threads are in the storage operation, there is no guarantee which one can precede the other, so it is Disorderly OrderOf
Multi-threaded, multi-machine or multi-process situation and then look at the Objectid mache, PID can not be guaranteed unique. Then the data will be more Disorderly OrderOf
WORKAROUND: Because the data in the collection collection is unordered (including capped collection), the simplest approach is to sort the objectid. You can sort by using two methods,
1.mongoDB Query Statements
Query query = new query (), if (id! = NULL) {Query.addcriteria (Criteria.where ("_id"). GT (ID));} Query.with (New Sort (Sort.Direction.ASC, "_id"));


2.java.util.priorityqueue
comparator<dbobject> Comparator = new comparator<dbobject> () {@Overridepublic int compare (DBObject O1, DBObject O2) {return ((ObjectId) o1.get ("_id")). CompareTo ((ObjectId) o2.get ("_id");}}; priorityqueue<dbobject> queue = new priorityqueue<dbobject> (200,comparator);
Misunderstanding two, multi-client high concurrency, whether the order can be guaranteed (after sort)? If the write is always guaranteed to be much larger than the read out (more than one second apart), there will never be a disorderly order.
Let's take a look at the examples
Now see the figure, take out data two times first time 4DF2DCEC AAAA ffff 36a8b813
4DF2DCEC AAAA eeee 36a8b813
4DF2DCEC bbbb 1111 36a8b814

Second time 4DF2DCEC bbbb 1111 36a8b813
4DF2DCEC AAAA FFFF 36a8b814
4DF2DCEC AAAA eeee 36a8b814

Now if you take the first maximum value (4DF2DCEC bbbb 1111 36a8b814) for the next query, you will miss the second three, because (4DF2DCEC bbbb 1111 36a8b814) is larger than all records for the second fetch. This can result in data loss scenarios.
WORKAROUND: Since the timestamp of the Objectid is up to the second, the first four bits of the counter operator are the machine and process number. 1. Processing records before a certain time interval (more than one second), so that the machine and process numbers cause chaos, there will be no disorderly order before the interval. 2. Single-point insertion, the original distribution to a few points of the insert operation, now unified by a point query, to ensure that the machine and process number is the same, using the counter operator to make the records orderly.
Here, we have used the first method.

Myth Three, not dbobject settings _id use MongoDB settings Objectid? When the MongoDB insert operation, new Dbbasicobject (), you see that _id is not filled, unless the manual setting _id. So is the server set up? Let's take a look at the code for the insert operation:
Implementation class
Public Writeresult Insert (list<dbobject> List, Com.mongodb.WriteConcern concern, Dbencoder Encoder) {            if ( concern = = null) {                throw new IllegalArgumentException ("Write concern can not is null");            }            Return Insert (list, true, concern, encoder);        }

You can see that you need to add, by default, to add
Protected Writeresult Insert (List<dbobject> List, Boolean shouldapply, Com.mongodb.WriteConcern concern,            Dbencoder encoder) {if (encoder = = NULL) encoder = DefaultDBEncoder.FACTORY.create (); if (Willtrace ()) {for (DBObject o:list) {Trace ("Save:" + _fullnamespace + "                "+ json.serialize (o));                    }} if (shouldapply) {for (DBObject o:list) {apply (O);                    _checkobject (O, False, false);                    Object id = o.get ("_id");                    if (ID instanceof ObjectId) {((ObjectId) ID). Notnew ();            }}} Writeresult last = null;            int cur = 0;            int maxsize = _mongo.getmaxbsonobjectsize ();        while (cur < list.size ()) {Outmessage om = Outmessage.insert (this, encoder, concern);       for (; cur < list.size (); cur++) {DBObject o = list.get (cur);                    Om.putobject (o); Limit for batch inserts are 4 x Maxbson on server, use 2 x to be safe if (Om.size () > 2 * maxsize                        ) {cur++;                    Break            }} last = _connector.say (_db, OM, concern);        } return last; }
Actions to add Objectid automatically
/**     * Calls {@link dbcollection#apply (Com.mongodb.DBObject, Boolean)} with ensureid=true     * @param o <code >DBObject</code> to which to add fields     * @return The modified Parameter object     */Public    Object appl Y (dbobject o) {        return apply (O, True);    }    /**     * Calls {@link dbcollection#doapply (com.mongodb.DBObject)}, optionally adding an automatic _id field     * @ Param Jo object to the Add fields to     * @param ensureid whether to add an <code>_id</code> field     * @retur n the Modified object <code>o</code>     *    /public object apply (DBObject Jo, Boolean Ensureid) {        Object id = jo.get ("_id");        if (ensureid && id = = NULL) {            id = objectid.get ();            Jo.put ("_id", id);        }        Doapply (Jo);        return ID;    }
As you can see, the Objectid is automatically added to the MongoDB driver package. Method of Save
Public Writeresult Save (dbobject Jo, Writeconcern concern) {        if (Checkreadonly (true))            return null;        _checkobject (Jo, False, false);        Object id = jo.get ("_id");        if (id = = NULL | | (ID instanceof ObjectId && ((ObjectId) ID). IsNew ())) {            if (id! = NULL && ID instanceof ObjectId)                ((ObjectId) ID). notnew ();            if (concern = = null)            return insert (jo);            else            return Insert (Jo, concern);        }        DBObject q = new Basicdbobject ();        Q.put ("_id", id);        if (concern = = NULL)        return update (Q, Jo, true, false);        else        return update (Q, Jo, True, false, concern);    }

In summary, by default Objectid is generated by the client and is not set to be generated by the serverOf
Misunderstanding four, findandmodify can really get to the self-increment variable?
DBObject update = new Basicdbobject ("$inc", New Basicdbobject ("Counter", 1));D bobject query = new Basicdbobject ("_id", key );D bobject result = Getmongotemplate (). GetCollection (CollectionName). Findandmodify (query, update); if (result = = null) {DBObject doc = new Basicdbobject ();d oc.put ("Counter", 1L);d oc.put ("_id", key);//Insert (CollectionName, doc); Getmongotemplate (). Save (Doc, CollectionName); return 1L;} Return (Long) result.get ("Counter");

Getting the self-increment variable is written using this method, but we'll find out when we're done. The findandmodify operation is to execute the find first and then execute the modify, so when result is null, it should be added and returned 0

The misunderstanding of Objectid in MongoDB and a series of problems caused by it

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.