This series of articles is translated from 50 Tips and Tricks for MongoDB Developers. I have not found the Chinese version yet. I am also learning about mongodb recently, so I just want to translate it. On the one hand, we can enhance our learning performance, and on the other hand, let everyone experience what we need to pay attention to by mongodb users.
First, declare that your English level is not too high. In addition, some English words cannot be translated into Chinese, so the original English words may appear in the article, in other words, translation in some places may be stiff, that is to say, there may be literal translations. The main purpose of translating this book is to help you learn and explore it. If there is any inaccurate translation or more accurate translation, please point out that I will correct it in time, I would like to thank you.
Tip #1. Duplicate data for speed, reference data for integrity
Data redundancy is for performance, and data reference is for integrity.
Data used by multiple documents can be directly embedded into the document or referenced in the document. Embedding is not necessarily better than referencing. On the contrary, referencing is not necessarily better than embedding. Each option has its own choice. Whatever it is, you should choose the method that suits your application.
The embedded structure may cause data inconsistency. Assume that you need to change the fruit value in figure 1.1 from Apple to Yali. After you have modified the fruit value in the food set, your application crashes, the fruit value in other places is still the old value. At this time, there are two different fruit values in your application.
650) this. width = 650; "border =" 0 "src =" http://www.bkjia.com/uploads/allimg/131228/134335JI-0.png "alt =" "/>
Figure 1.1 embedded structure: fruit Values exist in both the food set and the meals set.
Non-consistency is not a big problem, but "not a big problem" is also hierarchical. This level depends on your user needs. For many applications, inconsistencies in a short period of time are acceptable. If a user modifies his name, it is acceptable that his old name will be displayed in his old post within a few hours. If it is unacceptable even for a short period of time, you need to consider using a referenced structure.
650) this. width = 650; "border =" 0 "src =" http://www.bkjia.com/uploads/allimg/131228/1343353C5-1.jpg "alt =" "/>
Figure 1.2 reference structure: the fruit value only exists in the food set. The meals set stores the fruit id.
This requires a balance. You cannot have the best performance at the same time and ensure timely data consistency. You must decide which one is more important to your application.
For example.
Suppose we are designing a shopping cart application. We have designed to include order information in mongodb. What information should the order contain?
Referenced Structure
- a product:
- {
- "_id":productId,
- "name":name,
- "price":price
- }
- a order:
- {
- "_id":orderId,
- "user":userInfo,
- "items":[
- productId1,
- productId2
- ]
- }
Each productid exists in the item of the order. To display the order content, first query the order set and then query the product set based on productid to obtain the corresponding product name, there is no way to obtain the complete order information with only one query.
If the product information is updated, new product information will be displayed in all regions that reference the product. The referenced structure slows down data reading speed, but there will be good consistency in multiple orders. To achieve atomic changes in multiple documents, you only need to modify the information of referenced documents ).
Embedded Structure
- a product:
- {
- "_id":productId,
- "name":name,
- "price":price
- }
- a order:
- {
- "_id":orderId,
- "user":userInfo,
- "items":[
- {
- "_id":productId1,
- "name":name,
- "price":price
- },
- {
- "_id":productId2,
- "name":name,
- "price":price
- }
- ]
- }
Embed product information into order information. When you need to display the order, you only need to perform one query. If the product information changes and we want to pass the changes to the order, we need to update multiple independent orders.
The embedded structure accelerates reading speed, but the consistency is reduced. The product information cannot be modified by atoms in multiple documents.
To determine whether to use an embedded structure or a reference structure, refer to the following factors:
- For data that rarely changes, it is read again every time. Are you willing to pay this price?
Are you willing to take the penalty for 1000 reads? In most applications, the read pressure is greater than the write pressure. This requires you to carefully test your proportions.
How often is the referenced data changed? The fewer changes, the more we agree to use the embedded structure. Referencing data that rarely changes, such as names, dates of birth, inventory tags, and addresses, is not worthwhile.
- How important is consistency?
If consistency is important, you should use a reference structure. For example, multiple documents require atomic viewing of data changes. If we are designing a trading system, marketable securities can only be traded at a specific time, and at a time when they cannot be traded, We need to immediately lock these marketable securities. This operation may be better at the application level, because the application needs to know when to lock or unlock.
In the above Order application, consistency may be harmful. Suppose we want to give a product a discount of 20% off. We do not want to update the product information in an existing order. At this time, we may need a snapshot, which is the product information when the order is placed.
- Do you need a faster reading speed?
If the read speed is as fast as possible, we should use the embedded structure. Real-time systems should use more embedded structures.
The above order document is an example of a good embedded structure. We don't want to change the order information when changing product information. Here, the referenced structure cannot bring us any benefit.
In this example, the embedded structure is the best choice.
Finally, let's give you an address, Your Coffee Shop Doesn't Use Two-Phase Commit. The example in this section describes how to handle consistency problems in a real environment, and how to design related system changes.