ArticleDirectory
- The first problem: the key-value database may have a lot of keys. That's right, but it's a big mistake for MongoDB.
- The second question: Is findone ({_ ID: XXX}) Faster?
- Third question: Use update in detail
In the previous article, we tried to find out the effect of true or false. Some of them are slogans. Now we start to practice them, and they are attributed to pragmatism.
Scenario
Before we start, we should first set a scenario like this:
1. 1 million registered users' web games or mobile games are not warm, just because the data volume is insufficient. This is also the time when the performance of traditional MySQL databases is getting tighter.
2. A database uses only one common server. Read/write splitting, horizontal scaling, and memory cache. If the contribution and activity of 1 million registered users are not high, I am afraid the company's days are not that plentiful, and investment in databases is limited.
Taking this scenario as an example, each user has 100 items, and the user will get or lose them at any time.
Let's take a look at how these 0.1 billion items are made.
The prototype and instance design method are generally used for the item, which is not in the database category.
Item 001 is a dragon knife, with a price of 1500 and a base attack of 150. These are called prototype items and saved in the prototype data file.
This prototype data file, no matter what database or local files exist, is not a problem for the server and does not interfere with the database design, so we will not discuss it.
Relational Database Design Method
Typical Relational Database Design Methods:
User table: Field XXX userid XXX, with 1 million records
XXX is another field, and userid indicates the user
User item table: Field XXX userid itemtype XXX, number of records: 0.1 billion
XXX is another field, marked by userid
Isn't the number of records of 0.1 billion seem a little headache, MySQL will have to think of a variety of methods at this time.
MongoDB Design Method
However, if we use MongoDB to meet this requirement, there is no problem.
First, the first set: users set. username is used as _ id, with 1 million records
Then we have two options for item organization.
1. Create an items object in the users set value and use the bson array to save the item (Mongo officially called bson, the same storage method as JSON)
Method 1: No additional records
2. Create a useritems set and use username as the value of _ id for each useritems set to create an item object. A bson array is used to save the item.
Method 2: Add a set and 1 million more records
Our item data looks like the following:
{_ ID: XXX, items :[
{Itemtype: XXX, itempower: XXX },
...
...
...
]}
Test Method
The test method is as follows: The test client randomly checks the number of items of a user, adds one item less than 100, and deletes one item greater than 100.
1 million times in a row, with 10 concurrent threads.
If you use the relational database design method + MySQL for implementation, this is a very stressful data processing requirement.
However, the document database design method + MongoDB is used for implementation. This test is not under pressure.
Notes
Even if we use such a winning design method, you may still be able to write it slowly.
Because MongoDB does not have good guidance and constraints in interface design, if you do not pay attention to it, you can still use it very slowly.
The first problem: the key-value database may have a lot of keys. That's right, but it's a big mistake for MongoDB.
MongoDB index costs a lot, to what extent:
1. Huge memory usage. 1 million indexes account for about 50 MB of memory. In this design, 5 GB memory will be used for indexing as a record.
We cannot provide you with such a server in the case of silk,
2. huge performance loss. As a database, everything will eventually be written to the hard disk. There is no table structure like a relational database, and MongoDB's index Writing Performance looks poor. If the record data is small, you can observe this shocking scene. By adding an index, the performance has changed to 1/2. By adding two indexes, the performance has changed to 1/3.
It is worth adding additional indexes only when the second index query is inevitable. Because there is no index data, the query performance is slow by adding a few zeros, which is worse than adding an index.
Since we have selected the key-value database, we should try to avoid the need for multiple indexes.
All indexes can only exist in the memory. When reading records, you also need to process bson in the memory. The memory also plays a more important role: Reading the cache.
Originally, there was not enough memory. We should strictly control the number of records and use bson for storage.
So how can we consider the second design method in the MongoDB design? There is an independent useritems set. Isn't there any more than 1 million records?
This is based on two other considerations: A. bson's processing requires repeated hard disk and memory swap. If each record is smaller, the IO pressure is lower. Memory and hard disk are scarce resources for servers. It is more cost-effective to split the data into another collection. This requires testing a proper size based on the business conditions, server memory and hard disk conditions, we use the value 1024 for the moment. A single user's prop table will definitely exceed 1024 bytes, so we should consider separating it into a collection.
B. You can move another set to another server without deploying a sharded cluster. As long as the server can easily HOST 1 million users, Will 2 million be far behind? Before you have the money to deploy the sharded cluster, consider the second group of servers to be more practical.
The second question: Is findone ({_ ID: XXX}) Faster?
Without a doubt, findone ({_ ID: XXX}) is the most direct method to get value with key.
Indeed, using the key to get the value is the only way we can use to access the value. Other methods are not called the key-value database.
However, since we want to control the number of keys, a single value will be relatively large.
Do not use findone ({_ ID: XXX}). items [3]. itemtype elegantlyCodeDeception, this is very slow, and he is killing almost all your traffic.
No matter what findone ({_ ID: XXX}) is next, it always returns the complete value to you. Our 100 items also include 6 ~ 8 K.
This query traffic is already very high. If you use MongoDB solution 1, your single value contains all the data of a user, and it will be larger.
If the query client and database server are not in the same data center, traffic will become a major bottleneck.
The query function we should use is findone ({_ ID: XXX}, filter), which sets the returned filter conditions, which will be filtered out before being sent to you.
For example, findone ({_ ID: XXX}, {items:{"$ Slice": [3, 1]}). This is the same as the elegant code above, but it consumes a small amount of traffic.
Third question: Use update in detail
This is opposite to Issue 2. Do not use brute-force findone or update the entire node as far as possible. Although the performance of mangodb is quite violent, the I/O performance limit is about equal to that of MongoDB. Violent update will meet the performance limit of I/O while taking up traffic.
Except for the insert or save operation when a node is created, all updates should be modified with the modifier.
For example, update ({_ ID: XXX}, {$ set: {"items.3.item. Health": 38}); // modify the health value of the third weapon.
For one modification and batch modification, MongoDB flush once (2.x) for 100 ms by default. As long as the two modifications are close to each other, it is highly likely to be saved together.
However, merging is definitely better than not merging. The modification of merging must be saved together. This also depends on the development method used. If PHP is used as the data client, after multiple operations are cached and submitted together, the implementation is complicated.
Note that the above three points do not count as many as 1 million registered users. The MongoDB server with 4 GB memory and GB hard disk space can easily cope with this problem. The performance bottleneck is hard disk I/O, which can easily increase the throughput by several times using raid and solid state disks. Without using a large amount of JS computing, the CPU won't be a problem, don't let the index expand, and the memory won't be a problem. You don't need a pile of cores and massive memory. More memory can make the cache better, but it is far worse than read/write splitting. If the query performance is insufficient for high concurrency, you must adopt the deployment method of read/write splitting. When I/O becomes a bottleneck again, you can only use the cluster deployment MongoDB to enable the sharding function, or perform the diversity and key hashing work on your own.