MongoDB Online Practice Guide for supporting Bai access
Document
The key in the document prohibits the use of any special characters other than _
Try to keep the same type of document in one collection and spread different types of documents in different collections
Documents of the same type can significantly improve index utilization, and if a document is mixed, a query that often requires full table scanning is likely to occur
Prohibit the use of _id, such as: writing custom content to _id
"Case 3" the MongoDB of a business in volume after a serious write performance problem, roughly: write to reach 300/s when IO run full, the investigation found that the business in the design time for convenience, but the _id written in a disorderly similar MD5 data. MongoDB tables are similar to InnoDB, is the Index organization table, the data content is followed by the primary key, and _id is the default primary key in MongoDB, once _id value is not self increase, when the amount of data reaches a certain degree, each write may cause the primary key of the two-fork tree greatly adjusted, This will be a costly write, so the write will drop as the volume of data increases, so be sure not to write the custom content in the _id.
Try not to make the array field a query condition
Case 4 A business created an index on an array field in a table, and after it was created, the table volume increased a lot, and the discovery was due to the large increase in index volume resulting in MongoDB, if you add an index to an array field, Then MongoDB will actively add a separate index for all elements in the array, such as: Add index {a:1} to array field {A:[x,y,z]}, and actually add the index:
{A:[x:1]}
{A:[y:1]}
{A:[z:1]}
The business has 11 elements in an array field, and it is equal to creating 11 indexes at a time, which is the root cause of the significant increase in index volume. In addition, if an array field exists in a composite index, then MongoDB creates a separate index for each element combined with other fields, such as adding index {a:1,b:1} to the array field {A:[x,y,z]} and {B:QQQ}, and actually adding the index to:
{A:[x:1],b:1}
{A:[y:1],b:1}
{A:[z:1],b:1}
If there are two array fields in a composite index, the number of indexes will be the Cartesian product of the elements in the two array fields, so MongoDB does not allow more than one array field in the index.
If the field is larger, you should compress it as much as possible
"Case 5" a business on the line has been very normal, but after 3 times times the volume found MongoDB server network card flow alarm, IO pressure alarm, found that the business said a very long text field stored in the MongoDB, and the average volume of this field reached 7 K. In a concurrent 2000QPS scenario, each fetch 1~20 data, causing the MongoDB to send nearly 100MB of data per second, and for the database, read and write are random io, so in such a large data throughput scenario, IO reached the alarm threshold.
Because the text is an easy to compress the sample, so we have to compress the field to store, so that the average volume reduced to 2K, and decompression at the business end of processing, and eventually reduce throughput to 20mb/s around.
If the field is larger and becomes a query condition, such as a long list of URLs, try to turn it into a MD5 store
"Case 6" a business on the line before the pressure test, the test found that a scene of the query performance is not ideal, troubleshooting found that the scene of the query conditions similar to: {url:xxxx}, and the value of the URL field is very long, the average size of the field reached 0.5K, In this case the size of the index becomes so large that, although the request can be indexed but not efficient, the DBA optimizes the scenario with business development:
1. Change the contents of the field from the real URL to the URL content MD5 value, the field volume has been greatly reduced, fixed in the 32-bit
2. Query, the user request through the URL query, and at this time the program will be the URL to MD5, and then use the resulting value to query, as a result of a large reduction in volume, so the query speed has been greatly improved, optimized after the stress test again, performance standards for the previous 6 times times.
Because the MongoDB is case sensitive, if the field is not case sensitive, in order to improve the efficiency of the query should try to store the data after the case, such as: all lowercase or add a uniform case for the field of auxiliary fields
"Case 7" a business needs to query based on the field {a:xxx}. The value of a in MongoDB is case-sensitive and cannot be configured to ignore case, but the business scenario needs to ignore case in order to satisfy query requirements, and this case sensitive conflict causes the business to use regular to match : {a:/xxx/i},i parameter in the regular is to ignore the case, on the line found that the query performance is very low, in a collection of 2 million documents a query needs to consume 2.8~7 seconds, concurrency reached 50QPS mongodb instance of the server's CPU ran to 973%.
MongoDB when using regular in query conditions, you can use indexes to achieve efficient queries like normal exact matches, but once you use the parameter I to ignore the case query optimizer, you need to adjust the capitalization of each data and then match it, and the request becomes a full table scan. This is the root cause of inefficiency.
For this scenario, you can create a new uniform size field, such as all lowercase: Assuming the original field: {A:aabb}, add a corresponding to it all lowercase: {A_low:aabb} and then through the field A_low query can achieve an exact match, after the improvement of the scheme, The query time for this scenario is reduced to 2 milliseconds although the new field causes the instance to become larger, it is worthwhile for a significant increase in performance.
Do not store too long strings, and if this field is a query condition, make sure that the value of the field does not exceed 1KB
The MongoDB index only supports fields within 1K, and if you deposit more than 1K of data, it will not be indexed