Distributed message queue based on Redis (3)

Last Update:2016-03-23 Source: Internet

Author: User

Tags redis cluster

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Redis is a simple, efficient, distributed, and memory-based cache tool. Assuming that the server is connected over a network (similar to a database), the Key-Value type cache service is provided. 1. what is Redis?
Redis is a simple, efficient, distributed, and memory-based cache tool.
Assuming that the server is connected over a network (similar to a database), the Key-Value type cache service is provided.

Simple is a distinctive feature of Redis.
Simplicity can ensure the stability and excellence of core functions.

2. performance
Performance: Redis is efficient enough.
Compared with Memecached, apsaradb for Redis delivers better performance when the data volume is small.
When the data volume reaches a certain level, Memecached performance is slightly better.

Conclusion: the overall performance of apsaradb for Redis is good enough.

// Ref: Redis performance test
Principle: the Value must not exceed 1390 bytes.

The experiment shows that:
The List operation has the same performance as the string operation, which is slightly different and can be ignored.
When using the Jedis built-in pool, the average time for "getting out and putting back from pool" is three times that for "reusing a single connection. This part needs to continue to study the underlying mechanism and use more appropriate experimental methods to further obtain data.
Using the Jedis built-in pool, the performance meets the current traffic needs and will be further explored after time.

3. data type
Redis supports five data types: string, Map, List, Set, Sorted Set.
List is particularly suitable for implementing queues. The following operations are provided:
Put an element on the left (or right), extract an element from the right (or left), read an element in a range, and delete an element in a range.

Elements in Sorted Set are unique and can be found by name.
Map can be efficiently searched by key.
If we need to implement finishTash (taskId), we need to find elements in the queue by name. The above two may be used.

4. atomic operations
The primary problem in implementing distributed queues is: no concurrency issues.

Redis is single-threaded at the underlying layer, and command execution is atomic. it supports transactions and meets our needs.

Commands directly provided by Redis are atomic operations, including lpush, rpop, blpush, and brpop.

Redis supports transactions. Use a method similar to begin... [Cancel]… Commit syntax, providing begin... Commands between commit are atomic operations, and changes to data between commands are invisible to other operations. Similar to the stored procedures in relational databases, hybriddb for MySQL also provides the highest level of transaction isolation.

Redis supports scripts. the execution of each script is atomic.

I did a concurrency test:
I wrote a small program and randomly performed push or pop operations on the List. the push is slightly more than the pop operation.
Records the detailed information of each processing to the database.
Finally, the data in the List is pop, and the detailed information of each pop is recorded in detail.
Count whether push and pop are equal, and whether push and pop are available for each piece of data.
500 concurrency, no concurrency issues.

5. clusters
Another important issue for implementing distributed queues is that single point of failure (SPOF) is not allowed.

Redis supports Master-Slave data replication. you can set slave-of master-ip: port from the server.
The cluster function can be provided by the client.
The client uses the sentry to automatically switch the master server.

Since all queue operations are write operations, the main purpose of the slave server is to back up data and ensure data security.

If you want to perform multiple master clusters based on sharding, you can use zookeeper.

Redis 3.0 supports clusters. I haven't looked at it carefully yet. it should be good news. if you can use them all, please try again.

What if the master is down?
The Sentinel selects a new master. The service is suspended in the message queue.
In the most extreme case, all Redis is stopped. when the message queue finds that Redis is stopped, an exception should be thrown to the request to the business system to stop the Queue Service.
This will affect the business, and the business system will fail to place orders, review, and other operations. If acceptable, this is a solution.
Redis's entire cluster goes down, which rarely happens. if so, it is understandable that the business system stops providing services.

If the entire Redis cluster is down, the message queue will continue to provide services.
The method is as follows:
Enable the backup storage mechanism, such as zookeeper, relational database, and Memecached.
Local memory storage is not desirable. first, synchronizing the memory data of multiple client virtual machines is too complicated, which is equivalent to implementing a Redis. Second, it is too complicated to ensure the security of memory data storage.
The backup storage mechanism is equivalent to implementing the message queue of another version. The logic is consistent and the underlying storage is different. This implementation can provide lower performance and ensure the most basic principles.
To ensure that no concurrency problem occurs, the object lock and method lock are invalid because the message queue program runs in multiple virtual machines simultaneously. A virtual machine-independent lock mechanism is required. zookeeper is a good choice.
It's silly to set a relational database to the highest level of transaction isolation. Is there any other good way except zk?

What should I do if the Redis cluster is down and the Zookeeper is completely annihilated?
There is no end to this problem. it provides second backup storage, third backup storage, fourth backup storage ,..., Theoretically, it will go down at the same time. what should I do at that time?
Rich and willful local tyrants can continue, and when the budget is limited, they can do what they do.

6. Persistence
The application scenarios of distributed queues and cache are different.

What if there is data that cannot be persisted?
From the business system perspective, the message queue has been successfully sent.
The message queue thought that Redis had been properly received.
But Redis hasn't been written into the diary, so it hasn't notified our friends in time and hung up. It may be because the power is down, or the process is killed.

What are the consequences?
The task that has been executed will be executed again.
The task that has been put into the queue disappears.
Marked as a completed task. The task status changes to "in progress" and is executed again.
The consequences are unacceptable.

Distributed queues do not allow data loss.
From the business perspective, even if one piece of data is lost, it is unacceptable.
From the O & M perspective, it is acceptable to promptly discover and remedy Redis data loss.

From the perspective of architecture, the queue is stored in Redis, and business data (including task status) is stored in relational databases.
The task status is determined from the business point of view, and the message queue should not interfere. If the business status is not standardized and defined in a unified manner, the business data can only be used by the business developers to compare whether the task queue is complete and correct.
From the perspective of division of labor, the purpose of the task queue is to manage the task execution status. the business system has handed over this responsibility to the task queue, and the maintenance of the task status of the business system itself may not be accurate.
Conclusion: task queues cannot shirk their responsibilities, and data loss is the core function and cannot be compromised.

In Master-Slave data replication mode, configure bgsave and append the data to aof.

Configure bgsave on the slave server without affecting the master performance.

All queue operations are write operations, and the master task is heavy, so that the persistence work shared by slave will not be done by the master.

Both the rdb and aof methods are used, with multiple insurances.
Set appendfsync to always. // Test the performance in a single node. The average time is calculated 100000 times in a row. compared with per second, the performance loss is not large.
Performance may be slightly compromised, but the task is executed asynchronously without waiting for synchronization. this is worthwhile to ensure data security.

When O & M requires restarting the Master server, take the following sequence:
1. use cli shutdown to stop the master server. after the master completes the command, turn it off. At this time, the "sentinel" will find a new master.
You cannot directly kill or directly open the firewall to interrupt the connection between the master and slave.
The master firewall stops external services, and the Master automatically switches to other servers. the original Master continues to persistently send aof to the original slave servers.
2. perform O & M on the original master.
3. start the original master. at this time, it is already from the server. Wait patiently for it to obtain the latest data from the new master. Observe redis log output to confirm data security.
4. Repeat 1-3 operations on the new master.
5. write the above operations as scripts for automatic execution to avoid human errors.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More