Talk about the challenge of Momo hegemony in Database (MongoDB)

Source: Internet
Author: User
Tags mongodb client
Comments: The use of MongoDB still needs to focus on design, or even higher user requirements, rather than the opposite; MongoDB protocol problems, this may not be simple redis during c ++ code integration; however, the MongoDB communication protocol is only a simple encapsulation of the socket and is not complicated. For the cursor iteration problem, the findnext iteration must throw an exception.

Comments: The use of MongoDB still needs to focus on design, or even higher user requirements, rather than the opposite; MongoDB protocol problems, this may not be simple redis during c ++ code integration; however, the MongoDB communication protocol is only a simple encapsulation of the socket and is not complicated. For the cursor iteration problem, the findnext iteration must throw an exception.

Comments:

MongoDB still needs to focus on design, or even higher user requirements, rather than the opposite;
MongoDB protocol problems, which may not be simple in redis during c ++ code integration; however, MongoDB communication protocol is only a simple encapsulation of socket and is not complex;
For the cursor iteration problem, the findnext iteration will definitely throw an exception and will not continue the query. I don't know what the author encounters;
The MongoDB chip key is obviously not well designed, even if the last log time + auto-incrementing id is used, it is still not ideal.

Our company started to use mongodb not because of the technical selection, but because the developer of the first game crazy blade we acted on chose it. After our agency agreement was signed, this game entered a period of nearly one year. Many database-related problems were found during this period, forcing us to become familiar with mongodb. During that period, we naturally chose mongodb as the database for our operation platform, so that the maintenance staff can concentrate on a database.

After some simple understanding, I found that many game developers in China use mongodb in the same way. Why? My opinion is as follows:

It is difficult to clearly design the data structure from the very beginning because the demand for games is changing. The technical background of many programmers in the gaming field is different from that in other fields. Before designing a game server, they mostly design the game client: screen, keyboard, mouse interaction, and UI are the places where they spend the most energy. I don't know much about how to use the database. At this time, NOSQL databases such as mongodb emerged. Mongodb is document-based and does not require you to design data tables. It is easier to integrate with dynamic languages. It looks beautiful. You just need to plug a data object of any structure into the database, and then pray that the database system will handle other things for you. If the database works well and the performance is insufficient, it is the responsibility of the database and has nothing to do with me. Seeing the evaluation data shows that mongodb has a very good performance, and there seems to be nothing to worry about.

In fact, no matter what system, in an environment that has performance requirements, it cannot be used completely as a black box.

This is especially true for games. As I mentioned in the previous article, it is absolutely impossible for us to throw all the changes in the game's data into the database. Traditional databases are not designed for games.

For example, if you synchronize the coordinates of a group of players to the database, can you query the list of players near a specific player? Mongodb provides the geo type. You can use the near or within command to query nearby users. But does it meet the update frequency of 10Hz?

We can send the player's buf formula to the database one by one, and then modify some attribute values to query the results obtained through the buf operation?

There are many such problems. Even if you can find a way to let the database work for you, the performance is also worrying. When we can solve them one by one in a specific database service, the database is eventually a game server.

In our company, the crazy blade Project is a snail bait that is responsible for platform construction. I heard a lot of anecdotes from him about using mongodb incorrectly.

At the beginning, the entire database did not create an index for the query. When there is no data, even if all the queries are O (N), traversing the entire database won't be a problem. It can be imagined that the performance will decrease as soon as the number of users increases.

Then, the database has been built with a large number of useless indexes, and some wrong compound indexes have also deteriorated the system. It seems that there is a performance problem, that is, the loss of an index. It is easy to appear in the later stages of project development. In fact, the solution is very simple: as long as the design owner calm down and think about it, the database system is actually a closed module for data management. If you manage the data, what data structure is more conducive to specific searches and what index data is needed.

The final problem is still the algorithm and data structure. The difference is that you don't need to implement it, but need to understand it.

In addition, databases are designed to be accessible concurrently, and concurrency is always complicated. Mongodb lacks transaction operations and needs to be simulated with the atomicity of document operations. This is easy to be used by inexperienced people (this is a strange circle, the more people with less database experience like mongodb, the less restrictions, it looks more natural .).

The mad blade has a bug: to make the user name unique during user registration, check the database to check whether the user name exists during user registration, if it does not exist, you can create a user with this name. However, users with the same name will appear after one day of online operation.

I added the mongo driver to skynet because of the company's project requirements. To be honest, I am very interested in mongo when implementing this driver. At last, only the underlying communication protocol is implemented. In this part, the protocol design is very ugly. But even so, I am also able to handle this part, instead of using a ready-made driver.

Mongo's official drivers are built-in socket communication modules. It is difficult to extract the Protocol parsing part separately and attach it to the I/O model of your project. (Btw and redis are much better in this regard, because the protocol is simple enough, you can use dozens of lines of code to implement its communication protocol without relying on the driver module .)

I'm curious about how I integrate mongodb's official C ++ driver with boost. asio used by I/O on the server. As expected, they opened an independent thread to process mongo data and then distributed the data objects across threads. After studying this implementation, we can see problems. Programmers can easily misunderstand the internal meaning of the mongodb client api.

At first, the mad edge developer thought that after obtaining a set of query results from mongo, The findnext that calls cursor only iterates in the object memory, and all the results are returned at the beginning. I thought it would be nice to transfer the bson object from the mongo thread to the main thread. This is not the case. mongo only returns a set of query results at a time. When the results are iterated, findnext automatically submits new query requests. At this time, the object is no longer in the original mongo thread.

Students who have studied C ++ can imagine how long it takes for you to go to code review instead of the C ++ project you are involved in to find bugs? By the way, you need to add a business flow split by various boost. asio callback functions. So for a period of time last year, we had to completely stop our other tasks and carefully read the tens of thousands of lines of C ++ code from the beginning.

It seems that the old gossip people are not very generous. Let's talk about our own mistakes.

The first server accident caused by Momo was a weekend in middle January 2014. Accurately speaking, this is not a major operation accident because no player data is damaged or the service is suspended unexpectedly. However, it was the first time that we found that there were insufficient considerations in the earlier design.

Sunday, January 12. At around seventeen o'clock P.M., our SA Aply found that our operation log was delayed for three hours before arriving at the operation platform. However, the data continues to flow in, and the system is also very stable.

At, Liu Yang of the platform group reported that the operation data had been delayed for five hours, which aroused everyone's attention. Because it was a weekend, developers all went home to rest, and Xiaojing went online at. At this time, it was found that the memory usage of the game server was 10 Gb higher than usual during the same period, and it continued to rise.

I received a call at around and analyzed it over the phone. I thought it was because log data was sent from skynet's log service, which may be stuck in a linked list of socket server. This code is not complex. inserting new written data is an O (1) operation, so there is no risk of blocking the player game. The output log frequency does not consume all the memory in a short time. The game server is currently secure.

Although the cause of the accident was not analyzed at, we immediately adopted an emergency solution. A game server was restarted, and 80% of the players on the old server were online directed to the new backup server. A new log database cluster is also started. We plan to wait until Monday for a fixed maintenance period.

At, log output latency also occurred on the newly started game server. Because the operation log is output to a mongos-managed cluster, we try to use the old cluster (no new data is written, but the old data is not completely digested) the attempt to delete some indexes has no effect.

At a.m., a new backup cluster was started and mongos was canceled, so that each machine could independently connect to a separate mongodb instance. The situation finally improved.

The above is an excerpt from the accident record at that time.

It was Tuesday to thoroughly understand the cause of the accident.

On the surface, it seems that a large number of database insertion operations are accumulated on the mongos service. This single point of overload is caused. Our initial operation log output is a little too many. For example, each soldier's training has a separate log, and this log is huge in Momo Warcraft games. We cut down and simplified some of the logs, but it does not seem to be a fundamental explanation of this accident.

The problem lies in the shard key selection of mongos. Mongo can specify several fields of the document as the shard key. mongos treats the key as an integer and divides the document into several buckets according to the Integer Range. Then the bucket is evenly distributed to the slave machine behind it.

If your key is a regular number, and you need this rule not to undermine the fairness of Bucket allocation, you can also apply a hash algorithm to the original selected key, make the key sufficiently hashed. At the beginning, we made the key based on the hash result of the auto-increment id.

The incorrect shard key selection is the cause of the accident.

Because we have a large number of sequential write operations, we should give priority to ensuring smooth writing. If you view these documents in a random hash, the old and new logs will be allocated together. Mongo does not implement data in one unit, but blocks data. The combination of hot and cold data will cause the write disk IO volume to be much larger than the actual log output.

Finally, we adjusted the shard key and separated it by log time and auto-incrementing id, which reduced the IO volume of mongo data to several orders of magnitude.

It is important to understand how the system works. Reading the document is also important. This issue has been discussed in the mongoDB document.

Ps. After this accident, I added more monitoring to skynet to facilitate the warning of overload of a single module. This helps us to locate problems later. Let's look at the stories about redis.

Source: http://blog.codingnow.com/2014/03/mmzb_mongodb.html

Original article address: Let's talk about the database pitfalls of Momo Warcraft (MongoDB). Thanks to the original author for sharing them.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.