@ Zheng summary creation date: 2012/9
Application Design:
1)
If the query does not use the expected index, use hint to force the specified index.The Product Center of the main site uses many document fields and many indexes. When Sha Chuang checked for slow queries, he was puzzled. Why is the query still very slow when the joint index was explicitly specified. Hint example: db. collection. find ({"age": 18, "username ":/. */}). hint ({"username": 1, "age": 1 })
2)
Design documents to be self-sufficient,
Design
Self-sufficiency documentMongoDB should be a large and silent data storage (big, dumb data store ). It does not need to perform any processing, but is responsible for storing and reading data. You should stick to this purpose to avoid forcing MongoDB to do some computing work that can be performed by clients. If you really want to calculate the amount of data that is not explicitly stored in some documents, you have two options:-severe performance penalty (let MongoDB use JavaScript For computation ); -- explicitly store the data in the document.
Implementation:
3)
Override _ id when you have your own simple, unique idIf your document has a unique key value and does not require the ObjectId attribute, overwrite the default _ id field. You also need to create a unique index on your unique id.
Optimization:
4) when the data volume is large, you can compare the query efficiency of the index in ascending and descending order when creating a joint index.Db. collection. ensureIndex ({"store_id":-1, "shop_id": 1}) 1 indicates the ascending order of ascending, and-1 indicates the descending order of descending. When creating an index for a single field, you do not need to consider whether the index is in ascending or descending order. However, for sorting or range queries under the combined index (including $ in, $ gt, and $ lt), it is important to set the index to ascending or descending. Sun guoxi believes that it is best to use big data to simulate the business scenario. The merchant center once found that "-1" is more than 10 times faster than "1.
5) how to create an index if you want to sort it under the combined indexMongoDB and MySQL are both B-Tree indexes. Therefore, the joint index can be set as follows: the query statement is db. collection. find ({x: 1, y: 2 }). sort ({z: 1}), the index can be db. collection. ensureIndex ({x: 1, y: 1, z: 1}) // that is, x + y + z can also be db. collection. ensureIndex ({y: 1, x: 1, z: 1}) // The y + x + z sorting field is always at the end of the Union index. Try to put the fields that can filter data in front. However, if Range Queries is involved, be careful. The query statement is db. collection. find ({"country": {"$ in": ["ZH", "EN"]}). the better index for sort ({"cars": 1}) is db. collection. ensureIndex ({"cars": 1, "country": 1}) is used to query a range (including $ in, $ gt, and $ lt, in fact, it is usually ineffective to intentionally append the Sorting index to the end. In the process of range query, the result set we get is not ranked by the appended field, and an additional sort is required. In this case, the index may be created in reverse order (the sorting field is in the front and the range query field is in the back), but it will be a better choice. Of course, whether it is better is also related to the specific dataset, but we still need to test it. For another example, the query statement is db. collection. find ({x: 1, y: {$ in: [1, 2]}). the better index for sort ({z: 1}) is db. collection. ensureIndex ({x: 1, z: 1, y: 1}) // x + z + y
6) Use explain to check whether the index is fully utilized.
HabitsCheck whether your query fully utilizes the index. Run the explain command to check the query execution and add necessary indexes to avoid table scanning.
7) Try to use Paging for queries and release the cursor as soon as possible.
8)
Avoid using a syntax that does not hit the index, such as $ nin
Data Safety and Consistency:
9) always use Replica Sets (copy cluster, Replica set ):
Background: Replica Sets provide high availability for MongoDB through the automatic failover mechanism. Replica sets are a form of asynchronous master/slave replication, adding automatic failover and automatic recovery of member nodes application point: in applications, for example, a fault occurs on the primary machine, then a secondary machine will become the new primary by election, and the entire cluster will still be able to provide normal services. Our service does not support MongoDB deployment solutions without synchronization mechanisms. Legend:
In addition, when using Replica Sets, it is best to add one arbitration server.
10)
By default, journaling logs are enabled:Background: The Journaling function is enabled by default for 64-bitMongoDB 1.9.2 and later versions. For versions earlier than 32-bit or 1.9.2, you must add -- journal when starting the command line. The emergence of Journaling was proposed after a user performed the kill-9 operation when using MongoDB on a single machine, resulting in data unavailability. When this function is enabled, the changes are first written into the Journaling log, regularly submitted in a centralized manner, and then made on the real data. If the server is safely disabled, logs are cleared. When the server is started, if there is a Journaling log, it will be played back. This ensures that logs that have been written but not played back before the server crashes can be executed before the user connects. Application: zheng yu strongly recommends that you enable Journaling logs during deployment. Note the storage location of data files. During use, make sure that your data file is in a persistent storage (such as the/data/mongodb directory ). You can also use non-persistent devices to store data files, but you should be careful because this may affect your cluster architecture. Hot data is best stored in memory. It is important to keep hot data (and index data) in the memory, which will affect the performance of the entire cluster. If the number of page fault increases through monitoring, it is likely that the hot data volume exceeds the available memory size. When the amount of hot data exceeds the available memory, there are usually two solutions: Increase the memory and data shards. We recommend that you increase the memory first, and then consider using data sharding.
Administration:
11)
Maintain version updates:Application: it is important to maintain version updates. 10gen fixes some problems in each version to make MongoDB run better. For example, in version 2.0.x, MongoDB's storage performance and concurrency performance are greatly improved, and a series of improvements include index optimization, Bug fixes, and compaction commands.
12)
Do not use MongoDB on a 32-bit system:Background: on 32-bit machines, MongoDB can only store about GB of data. Because MongoDB internal implementation improves performance through memory ing, its memory address itself limits data capacity on 32-bit machines.
13)
Too high pressure upgrade Server Configuration:Application: if the server load reaches 65%, you should consider upgrading the server configuration. In daily use, it is best to keep the load below 65%. This also affects data recovery and vertical scaling.
14)
Determine the hot data size:With MongoDB, you 'd better ensure that your hot data is under the memory size of your machine and that the memory can accommodate all hot data.
15)
Select the correct File System:MongoDB data files are pre-allocated. In Replication, non-Arbiter nodes of the Master and Replica Sets create sufficient Empty files in advance to store operation logs. These file allocation operations may be very slow on some file systems, resulting in the process being blocked. Therefore, we should select the file systems with fast space allocation. The conclusion here is that do not use ext3 or ext4 or xfs.
16)
Select an appropriate raid and disk:Use raid10 whenever possible to avoid raid5. It is best to use an ssd hard disk if economic conditions permit.
17) How
Disable MongoDB:Use kill-2 <mongo-pid> when the MongoDB database is shut down, or use admindb. shutdownServer () on the mongo terminal ()
18)
Partition (
Sharding)
Caution:Application: The sharding policy is affected by the data access characteristics. Therefore, before performing data sharding, it is best to clarify the data access mode and check whether sharding is required. Because the Shard Key has a great impact on the performance, it is very important to select a good Sharding Key. The selection of the Shard Key directly determines whether the data distribution in the cluster is balanced and whether the cluster performance is reasonable. A very important factor in choosing a Shard Key is the size of the affected Chunk (even with a Replica Set) If a Shard is completely inaccessible ). Config Server is vital to the healthy operation of the entire cluster. Therefore, once you choose to use the sharding mechanism, you must ensure that there are three Config Servers in the production environment. Never delete the data of Config Servers. Always make sure that the data is backed up frequently. If possible, use a domain name to specify the node address. For example, you can specify the corresponding local domain name in the/etc/hosts file, which makes the cluster configuration more flexible. The pressure on Config Servers is small, but it must still run on 64-bit instances. Do not place all three Config Servers on the same instance! This article is first published in bystanders-zheng Jing's 55 Best Practices series. Link: http://www.cnblogs.com/zhengyun_ustc/archive/2012/12/15/mongodb_bp.html reference resources: 1) programmer magazine: MongoDB best practices in Engine Yard eyes; http://www.engineyard.com/blog/2011/mongodb-best-practices/ 2) Kristina Chodorow, 50 Tips and Tricks for MongoDB mongos3) part of the content is from Song Tao, Liu kuibo and Sun guoxi 4) nosqlfan, mongodb Data Summary topic Several images are presented: