Talk about the database pitfalls of Momo Warcraft (Redis)

Source: Internet
Author: User
Comment: There are more pitfalls in Redis. Through this article, we can see many shortcomings in Redis. Note: I did not participate in the specific design of the database of Momo hegemony, But I participated in some discussions and made some comments. In the case of a problem, both feilong, Xiaojing, and Aply decided to solve the problem. So most of my judgments on Redis are from him.

Comment: There are more pitfalls in Redis. Through this article, we can see many shortcomings in Redis. Note: I did not participate in the specific design of the database of Momo hegemony, But I participated in some discussions and made some comments. In the case of a problem, both feilong, Xiaojing, and Aply decided to solve the problem. So most of my judgments on Redis are from him.

Comments:
There are more pitfalls in Redis. Through this article, we can see many shortcomings in Redis.

Note: I did not participate in the specific design of the database of Momo hegemony, But I participated in some discussions and made some comments. In the case of a problem, both feilong, Xiaojing, and Aply decided to solve the problem. Therefore, most of my judgments on Redis are also heard from their discussions. I did not carefully read the Redis documentation and read the Redis code with my own guesses. Although we finally solved the problem, the technical details described in this article are likely to be contrary to the facts. Please read this article and identify it on your own.

We did not use Redis on a large scale before Momo became a target. It just intuitively seems that Redis is very suitable for our architecture: our game does not rely on databases to help us deal with any data. Although the total data volume is large, the growth speed is limited. Because a single server has limited processing capabilities, and the game cannot be divided into servers, players can only see one world at any time and place. Therefore, we need a data center independent of the game system. This data center is only responsible for data transfer and data implementation. Redis seems to be the best choice. The game system only needs to index player data by player ID.

We divide the data center into 32 databases separated by player IDs. Data between different players is completely independent. At the design stage, I firmly opposed the practice of accessing a data center from a single point of view and insisted that each game server node should be connected directly to each data warehouse. It is unnecessary to create a single point.

Based on our estimation of the game data volume in advance, we only need to deploy 32 data warehouses on four physical machines, and start eight Redis processes on each machine. At first we used a 64 GB memory machine and then increased it to 96 GB memory. In actual measurement, each Redis service occupies 4 ~ 5 GB memory, it seems more than enough.

Because we only understand the Redis data implementation mechanism from the documentation, we do not know what difficulties we will embark on. To ensure security, we also have four physical machines as slave machines, synchronize and back up data on the host.

Redis supports two BGSAVE policies. One is the snapshot method. When a launch command is initiated, fork runs a process to dump the entire memory to the hard disk. The other is the AOF method, record all write operations on the database. Our games are not suitable for the AOF method, because our write operations are too frequent and the data volume is huge.

The first accident occurred in February 3, and the New Year holiday was not over yet. O & M is also relatively slack because the entire holiday is safe.

At noon, a data service host could not be accessed by the game server, affecting the login of some users. Failed to repair the connection online, so we had to start two hours of downtime maintenance.

The problem was preliminarily identified during maintenance. In the morning, the memory of a slave machine is exhausted, resulting in the restart of the slave database service. When the slave machine re-connects to the host and the eight Redis servers send SYNC at the same time, the host is destroyed.

There are two problems here, which need to be discussed separately:

Problem 1: the hardware configuration of the slave machine is the same as that of the host. Why is there insufficient memory first.

Question 2: Why is host overload caused by re-SYNC.

Problem 1: we did not go into details at the time, because we did not estimate the accuracy of the User growth rate during the Chinese New Year and correctly deployed the database. The memory demand of the database has increased to a critical point, so it is very likely that the memory shortage occurs on the host or slave. It may just happen that the slave server crashes first (I am afraid this is not the case for Reflection now, and the cold backup script is likely to be the culprit ). In the early days, we regularly rotate BGSAVE. When the data volume increases, we should increase the BGSAVE interval appropriately to avoid the redis service on the same physical machine performing BGSAVE at the same time, as a result, multiple fork processes consume too much memory. This was ignored because I went home for the Chinese New Year.

Problem 2: we do not know much about the master-slave synchronization mechanism:

Think about it. What will happen if you implement synchronization? It takes some time to synchronize data. Synchronization is best not to interfere with normal services, so it is definitely not good to ensure synchronization consistency with the lock. Therefore, in the same step, Redis also triggers fork to ensure that the synchronization point can be successfully reached after SYNC is sent from the slave node. After the slave server is restarted, synchronization is enabled for eight slave redis instances at the same time, which means eight redis processes are instantly fork on the host, this greatly increases the probability that the redis process on the host enters the swap partition.

After the accident, we canceled the slave. This makes system deployment more complex and adds many unstable factors, which may not necessarily improve data security. At the same time, we have improved the bgsave mechanism, instead of using a timer to trigger it. Instead, a script ensures that the bgsave of multiple redis instances on the same physical machine can take turns. In addition, the previous cold standby mechanism was also moved to the host. Fortunately, we can use scripts to control the cold standby time and stagger the IO peak of BGSAVE.

The second accident occurred most recently (February 27 ).

We have adjusted the Redis database deployment multiple times to ensure that the data server has enough memory. But there was still an accident. The final cause of the accident is that a Redis process uses swap partitions due to insufficient memory and the processing capacity is greatly reduced. In the case of a large amount of data, an avalanche effect occurs: Xiaojing added a guaranteed-performance rule to the original script for controlling BGSAVE. If the BGSAVE command is not received within 30 minutes, the data can be enforced once (I personally disagree with this rule ). As a result, half an hour after the data server loses its response to the outside world, multiple redis services enter the BGSAVE status at the same time, consuming the memory.

It took a day to trace the culprit of the accident. We found that the cold standby mechanism caused the fault. We regularly copy and back up redis database files in a package. When the operating system Copies files, it seems that a large amount of memory is used for File cache and not released in time. This led to the system memory usage greatly exceeding our original expectation when BGSAVE occurred.

This time, we adjusted the kernel parameters of the operating system, turned off the cache, and solved the problem temporarily.

After this accident, I reflected on the data implementation strategy. I think regular BGSAVE is not a good solution. At least it is a waste. Because every time BGSAVE stores all the data on the disk, in fact, a large amount of data in the memory database has not been changed. For a storage period of 10 to 20 minutes, only online players and the players they attacked during this period of time (about one to two attacks every 20 minutes) are involved ), this number is far less than the total number of players.

I hope that only the changed data can be backed up, but I do not want to use the built-in AOF mechanism, because AOF will constantly append the same copy of data, resulting in fast growth of hard disk space.

We do not want to add an intermediate layer between the game service and the database service, which sacrifices the read performance, and the read performance is crucial to the entire system. It is not reliable to forward write commands only. Data versions may be out of order due to lost and read commands.

What if I send a copy of data to both Redis and another data landing service when the game server needs to write data? First, we need to add a version mechanism to ensure that we can identify the sequence of write operations received by different locations (I remember the Bug of data version disorder occurred in the mad blade). Second, this doubles the write bandwidth between the game server and the data server.

Finally, I thought of a simple method: Start a monitoring service on the physical machine of the data server. When the game server pushes data to the data service and confirms the success, it sends the ID of the data set to the monitoring service at the same time. It then reads data from Redis and saves it locally.

Because this Monitoring Service and Redis 1-to-1 are configured on the same machine, and the hard disk write speed is greater than the network bandwidth, it will not be overloaded. As for Redis, it becomes a pure memory database and does not run BGSAVE.

This monitoring process also implements data. For data implementation, I chose unqlite. A few lines of code can be used to complete its Lua encapsulation. It only has one database file, making it easier to perform cold backup. Of course, levelDB is also a good choice. If it is implemented in C rather than C ++, I will consider the latter.

In connection with the game server, I started an independent skynet process on the database machine to listen to requests for Synchronous IDs. Because it only needs to handle a few simple Redis operations, I specially wrote Redis commands. In the end, this service has only one lua script. In fact, it is composed of three skynet services, one listening external port, one processing Redis synchronization command on the connection, and one single point writing data to unqlite. In order to make data recovery efficient, I made a special effort to combine the Redis commands for recovery when saving player data. In this way, you only need to read the player data from unqlite and send it directly to Redis.

With this, we can solve the cold and hot data in Redis. For players who do not log on for a long time, we can clear them from Redis on a regular basis. If the player returns after login, we only need to ask it to help recover.

Xiaojing does not like the implementation of skynet. He first wanted to implement the same thing with python, And then he became interested in the Go language. He wanted to use this to get started with the Go language. So today, we have not deployed this new mechanism in the production environment.

Source: http://blog.codingnow.com/2014/03/mmzb_redis.html

Original article address: Let's talk about the database pitfalls of Momo hegemony (Redis). Thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.