MongoDB Usage Summary: Some of the less common experience sharing

Source: Internet
Author: User
Tags bulk insert compact create index mongodb driver

The recent year is busy with data processing related work, with MongoDB very much, the following is the practice process of q&a, follow-up will not be updated regularly.

1. Count Statistics result error

This is because the distributed cluster is migrating data, which results in a count result value error and requires the use of aggregate pipeline to obtain the correct statistical results, for example:

Db.collection.aggregate ([{$group: {_id:null, Count: {$sum: 1}}])

Reference: "On a sharded cluster, count can result in an inaccurate count if orphaned documents exist or if a chunk migration is I N Progress. "

Reference: http://docs.mongodb.org/manual/reference/command/count/

2. Numbers that are updated/written to the document from the shell will change to float type

Reference: "The numbers in the shell are treated by MongoDB as double precision numbers." This means that if you get a 32-bit integer from the database, and after the document is modified, the integer is replaced with a floating-point number, even if the integer remains intact. ”

Reference: The first edition of the MongoDB authoritative guide

3. When the restore data to the new DB, do not first build the index

When you restore a Bson data file to another db, you need to be aware that you cannot create an index and restore the data, otherwise the performance is very poor, and the Mongorestore tool defaults to creating an index based on the index information from the dump when you restore the data. You do not have to create it yourself, and if you want to replace the index, you should also create it after the data is in storage.

4, the number of namespace in db is too large to create a new collection

Error tip: error:hashtable namespace index max chain reached:1335, how do I fix it?
This is the number of collection in the DB is too many, in practice in each collection 8KB calculation (as stated in the official document, may be related to index), 256MB can support 36,000 collection. The Db.system.namespaces.count () command can count the number of collection in the current DB, and the number of DB support collection is specified by the nssize parameter. It specifies the size of the DBNAME.NS disk file and specifies the maximum number of collection that can be supported by the DB, and NS is the namespace abbreviation. The default nssize is 16MB.
If you restart Mongod and modify the Nssize parameter, the new nssize will only take effect on the newly joined DB, not the previously existing DB, and if you want to use a new nssize for the existing DB, you must create a new db after increasing the nssize reboot. The collection of the old DB is then copied to the new db.
Namespace restrictions Related documents: http://docs.mongodb.org/manual/reference/limits/#Number-of-namespaces

5. Movechunk failed due to old data not deleted

Error log: "Movechunk failed to engage To-shard in the data Transfer:can ' t accept new chunks because there is still 1 deletes From previous migration ".
It means that the current attempt to accept the new chunk shard is deleting data from the last data migration and cannot accept the new chunk, so this migration fails. This log shows the warning, but sometimes found that the deletion of Shard lasted for more than 10 days is not completed, view the log, you can find that the same chunk deleted repeatedly, restart all the Shard can not accept the new chunk to solve the problem.
Reference:
Http://stackoverflow.com/questions/26640861/movechunk-failed-to-engage-to-shard-in-the-data-transfer-cant-accept-new-chunk
If you use Balancer Auto-equalization, you can add _waitfordelete parameters, such as:
{"_id": "Balancer", "ActiveWindow": {"Start": "12:00pm", "Stop": "19:30"}, "stopped": false, "_waitfordelete": True }
, this will not cause subsequent migrate failures due to the delete heap, but, of course, it is necessary to consider whether the blocking here will affect the normal operation of the program.careful use of waitfordelete in practice, because the migration performance after the discovery plus it is very poor, may appear stuck for more than 10 hours, the outside world grabbed the cursor handle of the migrated chunk, this time the deletion can not be executed, blocking the subsequent other migration operations.
Log when a cursor is opened that causes the migrated data not to be deleted in time:
2015-03-07t10:21:20.118+0800 [Rangedeleter] rangedeleter waiting for open cursors in:cswuyg_test.cswuyg_test, min: {_id : -6665031702664277348}, Max: {_id: -6651575076051867067}, elapsedsecs:6131244, cursors: [220477635588]
This may be stuck for dozens of hours, or even stuck, affecting subsequent movechunk operations, resulting in uneven data.
Workaround: Restart.

6, Bson size can not exceed the limit of 16MB

The Bson size of a single document cannot exceed 16MB. The find query sometimes encounters a 16MB limitation, such as when using $in queries, the array elements in the in are not too many. For some special data sources do mapreduce,mapreduce the middle will combine the data into "key:[value1, VALUE2]" Such a format, when the value is particularly high, you may also encounter 16MB limit. Restrictions everywhere, it should be noted, "the issue is so the 16MB document limit applies to everything-documents you store, documents MapReduce TR IES to generate, documents aggregation tries to return, etc.

7. BULK INSERT

Bulk INSERT can reduce the number of data submitted to the server, improve performance, generally bulk submitted Bson size not more than 48MB, if more than, the driver is automatically modified to MONGOs multiple commits.

8. Introduction of safe writing and its evolution

Keywords: acknowledge, write concern.

Before November 2012, the MongoDB driver, Shell client by default is unsafe write, that is Fire-and-forget, after the action issued, do not care if the true write success, if there is _id repetition, non-UTF8 characters and other exceptions, the client will not know. After November 2012, the default is secure write, the security level is equivalent to the parameter w=1, and the client can know if the write operation was successful. If the code uses MONGO or collection to connect to the database, it is the default legacy code that is not securely written, and the secure write has modified the connection database to the Mongoclient interface.
Secure writes can be divided into three levels,
The first level is the default secure write, which confirms that the data is written to memory to return (W=n belongs to this level);
The second level is journal save, before the data is written to the DB disk file, MongoDB writes the operation to the journal file, which refers to the confirmation that the journal file is written back;
The third level is FYSNC, and all data brushes are written to the DB disk file before it is returned.
The general first level is sufficient, the second level is to ensure that the machine abnormal power loss in the case of the data will not be lost. Secure write code for performance: Unsafe writes are about 3 times times the performance of the default security write. Using the Fync parameter is less performance and is generally not used.
If it is a replica set (replica set), its w=n parameter, n indicates how many copies of the security write to the set to return.
Reference:
http://docs.mongodb.org/manual/release-notes/drivers-write-concern/
http://docs.mongodb.org/manual/core/write-concern/
http://blog.mongodirector.com/understanding-durability-write-safety-in-mongodb/
http://whyjava.wordpress.com/2011/12/08/how-mongodb-different-write-concern-values-affect-performance-on-a-single-node/

9. Use index--probably not the same as you think

When using a composite index, if there are two sets of indexes, in the case of limited queries, it may be different from the conventional understanding:
Queries made with a composite index will have different performance at different orders of magnitude:
Combined index A: {"Age": 1, "username": 1}
Combined index B: {"username": 1, "Age": 1}
Full-Volume query: Db.user.find ({"Age": {"$gte": +, "$lte": ()}). Sort ({"username": 1}), using index A for performance better than index B.
Limited search: Db.user.find ({"Age": {"$gte": +, "$lte": ()}). The sort ({"username": 1}). Limit (1000), using index B is better than index a.
These two queries, when using index A, first find the age-compliant data based on the index, and then sort the results. When using index B, the name is traversed, the corresponding data is judged by age, and the result is name ordered.
Priority is given to the sort key index, which performs well on most applications.
Reference: "mongodb--the Definitive Guide 2nd Edition" Page89

10. No order of index position at query time

When you do find, do not ask the index must be in front,
Such as:
R is indexed in the Db.test collection
Db.test.find ({R: "AA", "H": "BB"}). Limit (+). Explain ()
Db.test.find ({"H": "BB", "R": "AA"}). Limit (+). Explain ()
Both lookup performance uses the R index.

11, using the combination index to do shard key can greatly improve the performance of the cluster

The combination index of "fixed value + increment value" can effectively realize distributed multi-hotspot writing and reading. Here are the Reading notes:
On a single MongoDB instance, the most efficient writes are sequential writes, while the MongoDB cluster requires that the writes be random to distribute evenly across multiple MongoDB instances. So the most efficient write is that there are multiple local hotspots: Between multiple MongoDB instances, there is a distributed write, and in the instance, sequential writes. To achieve this, we use a composite index.
For example: The first part of the Shardkey is very rough, can be a few of the fields, the second part of the index is an increment field, when the data increases to a certain extent, there will be a lot of the first part of the same second part of the different chunk, the data will only be written in the last chunk data, When the first part of different chunk is dispersed over multiple shard, the write of multi-hotspot is realized. If on a shard, more than one chunk can write data, that is to say more than one hotspot, when the hotspot is very many, it is equal to the random write without hot spot. When a chunk split, only one becomes hot, the other can no longer be written, otherwise it will produce two hot spots, no longer write to the chunk that is dead, follow only to it has read operation.
In my practice, in addition to the key combinations mentioned in the book, the pre-shard strategy is added to avoid fragmentation and data migration in the early data growth process. In addition, it is possible to create data that can be written using the principle of locality, for example, to sort data before data is written, with about 30% update performance improvements.
Pre-Shard is done this way: according to the combination of Shardkey information first split good chunk, these empty chunk moved to the Shard, to avoid the subsequent automatic division caused by data migration.
Reference: "mongodb--the Definitive Guide 2nd Edition" page268

12, how to build index to improve query performance?

In the query, the index is efficient, pay attention to its cardinality (cardinality the higher the value of the key can select more), in the composite index, let the cardinality high in front. Note that this is different from the Distributed Environment Choice Shard Key. Here are the Reading notes:
Index cardinality, which represents the number of values that an index corresponds to, and the lower the hash level, the more values an index has, the worse the index effect: When using an index, indexes with high hash levels can more easily exclude documents that are not eligible. It is more efficient to have subsequent comparisons executed in a smaller set. Therefore, it is common to select a high hash level key to index, or in the composite index, the high hash level of the key is placed in front.
Reference: "mongodb--the Definitive Guide 2nd Edition" Page98

13, non-in-situ update, performance will be very poor

When you update a document, if the new document occupies more space than the old document plus the padding space around it, the original location is discarded and the data is copied to the new space.
Reference: "mongodb--the Definitive Guide 2nd Edition" page43

14. Unable to increase index expiration after index establishment

If the index establishment specifies an expiration time, the subsequent update expiration time can look like this: Db.runcommand ({"Collmod": "A", index:{keypattern:{"_": -1}, Expireafterseconds:60 }})。

Note that the Collmod can modify the expiration time if the index has an expiration time, if the index has not set the expiration time, then cannot update, can only delete the index, rebuild the index and specify the expiration time.
Reference: http://docs.mongodb.org/manual/tutorial/expire-data/

15. _id Index cannot be deleted

Reference: "mongodb--the Definitive Guide 2nd Edition" page114

16. What is Paddingfactor?

It is a storage space redundancy factor, 1.0 means no redundancy, 1.5 represents 50% of redundant space, and has redundant space, allowing subsequent increases in size to be faster (without causing reallocation of disk space and document migrations), typically between 1 and 4. The value "Paddingfactor" of collection can be seen through Db.collection.stats ().
This value is handled by MongoDB itself and the user cannot set the paddingfactor. We can specify this value for an existing document in the compact, but this paddingfactor value does not affect subsequent newly inserted documents.
RepairDatabase, like the compact, can also remove redundancy to reduce storage space, but less redundant space can cause subsequent update operations that increase document size to become slower.
Although we cannot set paddingfactor, we can use usepowerof2sizes to ensure that the allocated space is a multiple of 2, which can also play a role (Usepowerof2size is enabled by default on the MongoDB2.6 version).
or manually implement padding: When inserting a document, use the default character to occupy a space, wait until the real data is written, then unset it out.

Reference:
http://docs.mongodb.org/v2.4/core/record-padding/
http://docs.mongodb.org/v2.4/faq/developers/#faq-developers-manual-padding

17. What is Usepowerof2size?

This is the parameter set for a more efficient reuse of disk space: the allocated disk space is a multiple of 2, if it exceeds 4MB, is the closest to the calculated value and is greater than its full MB.
The value "UserFlags" can be seen through Db.collections.stats ().
Usepowerof2size parameter is turned on by default after MongoDB2.6
After the use of the effect can be seen here ppt:http://www.slideshare.net/mongodb/use-powerof2sizes-27300759

18, aggregate pipeline specify the operation to complete the output document compared to MapReduce is insufficient

(based on the MongoDB2.6 version) MapReduce can specify the output to a specific db.collection, for example: Out_put = Bson. SON ([("Replace", "collection_name"), ("DB", "xx_db")])
Aggregate pipeline can only specify collection name, which means that data can only be written to this db, and the result cannot be written to capped collection, Shard collection.
In contrast, the aggregate pipeline limit is much more, and if we need to put the result in a certain db, we need to do a second migration:
Db.runcommand ({renamecollection: "Sourcedb.mycol", To: "Targetdb.mycol"})
But!! The above command requires execution under admin and can only be migrated to DB under the same Shard, and the migrated collection cannot be shard.
Attached error code information:
https://github.com/mongodb/mongo/blob/master/src/mongo/s/commands_public.cpp#L778
Uassert (13140, "Don t recognize source or target DB", Conffrom && confto);
Uassert (13138, "You can ' t rename a sharded collection",!conffrom->issharded (Fullnsfrom));
Uassert (13139, "You can ' t rename to a sharded collection",!confto->issharded (Fullnsto));
Uassert (13137, "Source and destination collections must is on same Shard", shardfrom = = shardto);
Reference: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-OUT-MTD

19. Several ways to kill the Mongod process

1) Go to mongod command-line mode to execute shutdown,
eg
$ MONGO--port 10001
> Use admin
> Db.shutdownserver ()
2) Simplification of 1 ways:
Eg:mongo admin--port 10001--eval "Db.shutdownserver ()"
3) Use the Mongod command line to close, you need to specify the DB path:
Mongod--dbpath./data/db--shutdown

20, cluster of shard key carefully use hash

If your log is a date attribute, then shard key does not use hash, otherwise delete the expired log can not be deleted as a block, in the update log, also can not take advantage of the principle of locality , find, UPDATE, insert data will be slow. In general, hash IDs are less stressful when dealing with small amounts of data, but when the amount of data is large (hot data is larger than the available memory capacity),crud performance is extremely poor and amplifies the impact of fragmentation on performance: The data is very fragmented, and when an expired log is deleted, the deleted space becomes fragmented. may be loaded into memory because of a disk read-ahead policy. In addition, the use of hash shard key will also waste an index, wasting a lot of space.

21, the number of copies is not too much

If you have more than 12 copies (MongoDB3.0.0 more than 50), then you have to choose to use Master-slave, but this will lose the fault self-recovery function, the primary node failure, you need to manually switch to the fault-free node.

22. Do not use localhost, 127.0.0.1 in mongos config server configuration information

When you start MONGOs, config server configuration information must not use localhost, 127.0.0.1, or add other machine shard when the error message appears:
"Can ' t use localhost as a shard since all shards need to communicate. Either use all shards and configdbs on localhost or all in actual IPs host:xxxxx Islocalhost "

Start MONGOs with the new config server, and you will also need to restart Config server, otherwise there may be an error message:
"Could not verify config servers were active and reachable before write"

If the "MONGOs specified a different config database string" error occurs after the change, the Mongod also needs to be restarted,

Config server modified almost all instances to restart. Also, you must not use localhost, 127.0.0.1 when configuring replica set.
Reference: http://stackoverflow.com/questions/21226255/where-is-the-mongos-config-database-string-being-stored

23. Shard Key selection is closely related to update performance

The choice of

Distributed Mongodb,shard key is very much related to update performance, even update availability, and needs attention.
1, when the individual field of the document is update, if the query part does not have Shard key, performance will be poor, because mongos need to send this UPDATE statement to all Shard instances.
2, when the Upsert parameter of update is true, the query part must take Shard key, otherwise the statement execution error, example:
mongos> db.test.update ({"_id": ". 7269993106a92327a89abcd70d46ad5 "}, {" $set ": {" P ":" AAA "}," $setOnInsert ": {" TEST ":" A "}}, True)
Writeresult ({
" Nmatched ": 0,
" nupserted ": 0,
" nmodified ": 0,
" writeerror ": {
" code ": $,
" errmsg ":" Upsert {q: {_i D: \ ". 7269993106a92327a89abcd70d46ad5\"}, U: {$set: {P: "AAA"}, $setOnInsert: {TEST: \ "A\"}, Multi:false, Upsert: True} does not contain Shard key for pattern {_: 1.0, b:1.0} "
}
}"
This is because if there is no shard Key,mongos You cannot perform this on all shard instances clause (which may cause each shard to insert data), or the option to execute the statement on a shard, an error occurred.
In addition, it is important to note that if you use the Pymongo engine, it will not tell you that there is an error, just that the function call is not returned to and executed under the shell to see the error message.

Attached:
The following English part comes from: https://jira.mongodb.org/browse/SERVER-13010
It's actually not clear to me, this is something We can Support-problem is this:
> Db.coll.update ({_id:1}, {}, True);
> Db.coll.find ()
{"_id": ObjectId ("53176700a2bc4d46c176f14a")}
Upserts generate new _ids in response to this O Peration, and therefore we can ' t actually target this correctly in a sharded environment. The Shard on which we need to perform the query could not be the Shard on which the new _id is placed.
means that Upsert produced a new _id,_id is Shard key, but if query does not have shard key, they do not know to which shard to execute this command, Upsert generated shard Key may not be the Shard to execute this command.
In addition, if _id is not shard key Our example is not successful, because there is no Shard key, which shard on which to execute the Upsert? Do not give all shard the same as normal update, or it may result in multiple insertions.
Reference:
https://jira.mongodb.org/browse/SERVER-13010
Http://docs.mongodb.org/manual/core/ sharding-shard-key/
http://stackoverflow.com/questions/17190246/ Which-of-the-following-statements-are-true-about-choosing-and-using-a-shard-key

24. Improve performance through RepairDatabase

From Db.stats (), you can see several key fields related to the fragment, DataSize, which represents the size of the data, which contains the space of the padding, storagesize, represents the space occupied by these data stores, contains the space of datasize and deleted data, It can be thought that storagesize/datasize is the disk fragmentation ratio, when deleted, the update document is more, it will become larger, consider doing repairdatabaseto reduce fragmentation to make the data more compact, in practice, This is extremely useful for improving curd performance. RepairDatabase need to note: It is to copy the data to a new place, and then do the processing, so repair in the DB directory before the disk needs to reserve one times the free disk space, if you find that there is not enough disk space, you can stop the service, and then add a new disk, Perform an instance-level repair and specify--repairpath as the new disk path, Eg:mongod--dbpath/path/to/corrupt/data--repair--repairpath/media/ EXTERNAL-HD/DATA/DB, the data for the instance is copied to the/media/external-hd/data/db for processing.

Reference: "mongodb--the Definitive Guide 2nd Edition" page325

25. The length of the index field cannot be greater than 1024 bytes

The length of the indexed field cannot be greater than 1024 bytes, or the shell will have an insert error prompt: "errmsg": "Insertdocument:: Caused by:: 17280 Btree::insert:key too large to index" 。 Use the "continue_on_error" parameter of Pymongo, do not issue an error prompt, be careful. Reference: http://docs.mongodb.org/manual/reference/limits/#Index-key-limit

26, after modifying the index expireafterseconds, load balancer fails

After modifying the expireafterseconds of the index, the load balancer fails with the error "2015-06-05t09:59:49.056+0800 [Migratethread] warning:failed to create index  Before migrating data. IDX: {v:1, Key: {_:-1}, Name: "__-1", ns: "Cswuyg_test.cswuyg_test", expireafterseconds:5227200} error:indexoption Sconflict Index with Name: __-1 already exists with different options check for the two movechunk that occurred Shard, and no inconsistencies were found, a cache was suspected, and all shard resolution was restarted.

27. config db cannot be written

Config db cannot be modified, only readable, resulting in drop, enablesharding failure: Config server related log: 2015-06-11t16:51:19.078+0800 [Replmaster] Local.oplog . $main Assertion Failure IsOk () Src/mongo/db/storage/extent.h 80mongos related log: [Lockpinger] warning:pinging failed for Dist ributed lock Pinger ' xxx:1234/xxx:1235:1433993544:1804289383 '. :: Caused by:: IsOk () This is a problem that colleagues encounter, not sure what the operation caused. Restart, configdb do repair can not solve. Finally through the dump, restore resolved: (1) The old ConfigDB dump out, (2) restore to the new configure server, (3) MONGOs with the new Configure server, (4) Restart all Mongod. Http://www.cnblogs.com/cswuyg/p/4355948.htmlhttp://www.cnblogs.com/cswuyg/p/4595799.html

MongoDB Usage Summary: Some of the less common experience sharing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.