Reprint Please specify Joymufeng, welcome to visit Playscala Community (http://www.playscala.cn/)
Capped Collections
MongoDB has a special collection called capped collections, its insertion speed is very fast, the basic and disk write speed is similar, and support in order to insert the efficient query operation. The size of the Capped collections is fixed, and it works much like a ring buffer (circular buffers), overwriting the first inserted data when there is not enough space left.
Capped collections is characterized by efficient insertion and retrieval, so it is best not to add additional indexes on Capped collections, otherwise it will affect the insertion speed. Capped collections can be used in the following scenarios:
- Storage log: The First-in-first-out feature of Capped collections just satisfies the storage order of log events;
- Cache small amounts of data: Because the cache is characterized by a low number of read and write, the index can be used to improve the reading speed.
Restrictions on the use of Capped collections:
- If you update the data, you need to create an index for it to prevent collection scan;
- When you update data, the size of the document cannot be changed. For example, the Name property is ' ABC ', it can only be modified to a 3-character string, otherwise the operation will fail;
- Data is not allowed to be deleted, if not deleted, only drop collection
- Sharding not supported
- Default only supports returning results in natural order (that is, insert order)
Capped collections can use the $natural operator to return the result in either the positive or reverse order of the insertion sequence:
db[-1})
Oplog
Oplog is a special kind of capped collections, which is special because it is a system-level collection, which records all operations of the database, and the cluster relies on Oplog for data synchronization. The full name of the Oplog is local.oplog.rs, located under Local data. Because local data does not allow users to be created, if you want to access Oplog users who need to use a different database and give that user permission to access the local database, for example:
"******", "roles": [{ "ReadWrite", "Play-community"}, { "read", " Local "}]})
Oplog Records are idempotent (idempotent), which means that you can perform these operations multiple times without causing data loss or inconsistency. For example, for $inc operations, Oplog automatically converts them to $set operations, such as the original data as follows:
{ "0", 1.0}
Perform the following $inc operation:
1}})
The logs for the Oplog record are:
{"TS":Timestamp (1503110518,1),"T":Numberlong (8), "H": Numberlong (-3967772133090765679), "V": Span class= "Hljs-type" >numberint (2), "OP": "U", "ns": " Play-community.test ", " O2 ": { " _id ": " 0 " }, " O ": { "$set": { " Count ": 2.0    }  }}
This kind of conversion can guarantee the power of oplog. Additionally Oplog is not allowed to create additional indexes to ensure insert performance.
Timestamps format
MongoDB has a special time format timestamps, which is used only for internal use, such as the above Oplog records:
Timestamp (1)
Timestamps length is 64 bits:
- The first 32 bits are time_t values, representing the number of seconds since the epoch time
- The last 32 bits are the ordinal value, which is an ordinal increment of the sequence number that represents the first operation in a second
Start Synchronizing Oplog
Before we start synchronizing oplog, we need to be aware of the following points:
- Because Oplog does not use indexes, initial queries can be costly
- TS can be saved when the Oplog data volume is large, and the TS can be used to reduce the first query overhead when the system restarts
- The OPLOGREPLAY flag can significantly speed up queries that contain TS conditional filtering, but only valid for Oplog queries
Val tailingcursor = Oplogcol. Find (Json.obj ("NS"Json.obj ("$in"Set (S${db}.common-doc ",S${db}.common-article ")),"TS"Json.obj ($gte, Lastts)). Options (Queryopts (). tailable.oplogReplay.awaitData.noCursorTimeout). cursor[Bsondocument] () Tailingcursor.fold (()) {(_, doc) = =try {Val jsobj = doc.as[Jsobject] Jsobj ("Op"). as[String]match { case "I "= = //insert case "u" = //update Case "D" = //delete } //Save TS value for later use if (tailcount.get ()% Ten = = 0) {} } catch { case T: throwable => logger.error ( "Tail Oplog error:" + t.getmessage, T) }}
Also note that reactivemongo-streaming Akka stream implementation has a bug, if the first query does not return data, it will continue to send query requests, about dozens of to hundreds of requests per second, because the Oplog query overhead is very large, will eventually cause MongoDB memory overflow. For details refer to keep sending queries while the initial query result of a tailable cursor is empty.
Reference
- MongoDB Doc-replica Set Oplog
- MongoDB doc-capped Collections
- MongoDB doc-tailable Cursors
Real-time synchronization MongoDB Oplog Development Guide