Delay task scheduling system-Technology selection and design (previous)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article from the NetEase cloud community

What is the scenario for delaying a task?
What are the existing solutions?
What is the problem?
What are the goals you want to achieve?
What are the possible scenarios?

RABBITMQ implementation

Implemented through dead-letter and dead-letter routing
By delaying the message plug-in to implement

Redis implementation
Delayqueue implementation
Time Wheel Implementation

Single-Watch Time wheel
Layered Time Wheel

Previous design (Db/delayqueue/zookeeper)
Another scenario (DB/DELAYQUEUE/ZOOKEEPER/MQ

What is the scenario for delaying a task?

3 days prior to the end of the exercise exam, send a cancellation to the uncommitted user
2 hours before the program starts, send notifications to participating users
Only visible to users when questionnaires are collected
Trigger some actions when the questionnaire ends the collection
Specify time to publish courseware
At the end of the course, user completion information is calculated
The live time is up, send the message to the user
After the user orders, no payment within 30 minutes, close the order
After the user payment, not shipped within 24 hours, prompt delivery
Users after a taxi, 48 hours after the automatic evaluation of 5 stars
This type of business is characterized by delayed execution. A simpler approach is to use a background thread to scan the eligible business data and process it one at a. This method of scanning interval time is not good setting, the interval time is too large to affect the accuracy, too small to affect efficiency and performance.

What are the existing solutions?

Trigger timed tasks with Linux crontab
Scan the Business table and filter out the qualifying data to manipulate it

What is the problem?

Tasks cannot be processed precisely because each type of task has a scan interval
Scan the business library to affect normal operation of the business
Task execution is too dense to lead to server intermittent stress
There is a single point of the system, the service that triggers the scheduled schedule is hung, and all tasks are not executed
The system is not fault tolerant, once missed, the task will not be executed again
No unified view to view task performance
No alarms to indicate failed tasks

What are the goals you want to achieve?

Accuracy (can trigger task handling at specified times)
Versatility
High performance (cluster capability of not less than 1000TPS)
Highly available (multi-instance deployment support)
Scalable (tasks are reassigned when services are increased and reduced)
can be retried (task failure can be retried)
Multi-Protocol (supports HTTP\DUBBO calls)
Manageable (business users can modify and delete tasks)
Can alarm (the number of failures reached the threshold can trigger the alarm)
Unified view (easy to see task execution, manual intervention for task execution)

The technical scenarios discussed below are premised on precise triggering, so we do not discuss some of the current distributed scheduling systems in the industry such as Elastic-job,xxl-job,tbschedule, which can not solve the problem of delay task precision triggering.

what are the possible scenarios? RABBITMQ implementation

Implemented through dead-letter and dead-letter routing

The principle is as follows:

What is dead letter:

Message is rejected
Message has expired
Queue reaches maximum length

RABBITMQ can set X-MESSAGE-TT, expiration on queues and messages to control the lifetime of messages, and if timed out, the message becomes dead letter.

What is a dead-letter route:

RABBITMQ can set X-dead-letter-exchange and x-dead-letter-routing-key two parameters on the queue.
When a message becomes dead-letter in a queue, it is routed by both parameters, and the message can be consumed again.

Instance operations:

Create delay Queue (set dead-letter route)
Create a ready queue
Create a dead-letter route
Bind dead-letter routing with Ready queue
Send Delay message
Enter the ready queue after the message expires

Advantages:

Efficient, scale-out can be easily scaled with RABBITMQ distributed features, and support persistence

Disadvantages:

Management of sent messages is not supported
One message expires earlier than other messages in the same queue, and messages that expire prematurely do not take precedence into the dead-letter queue.

It is therefore necessary to ensure that the latency of each task on the business is consistent. If you have a task with different delays, you need to create a separate message queue for each of the different deferred tasks, which lacks flexibility.

by delaying the message plug-in to implement

The principle is as follows:

Core code Flow:

The principle is that deferred messages are saved to the Mnesia table, and the messages expire before they are routed to the corresponding queue in exchange based on the delay time set by each message header x-delay.

Instance operations:

Download Plugin
bintray.com/rabbitmq/coMmunity-plugins/download_file?file_path=rabbitmq_delayed_message_ Exchange-0.0.1.ez
Installing plugins

docker-compose.xml(将插件安装到容器中) version: '2' services:   rabbitmq:     hostname: rabbitmq     image: rabbitmq:3.6.8-management     mem_limit: 200m     ports:       - "5672:5672"       - "15672:15672"     volumes:       - ~/dockermapping/rabbitmq/lib:/var/lib/rabbitmq/       - /Users/oldlu/workspace/document/docker-compose/rabbitmq/rabbitmq_delayed_message_exchange-0.0.1.ez:/usr/lib/rabbitmq/lib/rabbitmq_server-3.6.8/plugins/rabbitmq_delayed_message_exchange-0.0.1.ez 启用插件 rabbitmq-plugins enable rabbitmq_delayed_message_exchange

To create a route of type X-delayed-message
Create a ready queue
binding queues and routes
Publish delay message (sets the number of milliseconds for x-delay= delay)

SOURCE Analysis
Github.com/rabbitmq/rabBitmq-delayed-message-exchange/blob/master/src/rabbit_delayed_message.erl

核心函数消息入队：internal_delay_message启动Timer：maybe_delay_first消息处理：handle_info

Advantages:

One message expires earlier than other messages, messages that expire prematurely are routed to the queue prematurely, and there is no need to create separate message queues for different deferred messages.

Disadvantages:

Management of sent messages is not supported
There is only one copy of the data in the cluster (stored in the Mnesia table under the current node), and the message is lost if the node is unavailable or the plug-in is closed.
Currently the plugin only supports disk nodes and does not support RAM nodes
Performance is almost the same as the original (normal Exchange receives the message directly to the queue, while the delay queue needs to determine whether the message expires, the need to save the non-expiring in the table, the time to get out of the route)

Redis Implementation

The ordered set (Sorted set) is a data structure provided by Redis with set and hash characteristics.
Each of these elements is associated with a score and is sorted by this score.
Its internal implementation uses two data structures: Hash table and skip list (jumping tables)

Features of the Skip list

Composed of many layers, level is randomly generated by a certain probability
Each layer is an ordered list, which is ascending by default
The bottom-most linked list contains all the elements
If an element appears in the list of level I, then it will appear in the list below level I
Each node contains two pointers, one pointing to the next element in the same linked list, and one pointing to the following layer of elements
The time complexity of insertion and deletion is O (logn), when it reaches a certain data size, its efficiency is similar to that of a red-black tree

main command

Zadd: Adding elements to the sorted set
Zrem: Deleting the specified element in the sorted set
Zrange: Returns elements within a specified range in order from small to large

Implementing a delay queue

Add delay task to sorted set, set delay time to score
Starts a thread that constantly determines whether the score of the first element in the sorted set is greater than the current time
If greater than, removes the task from the sorted set and adds it to the execution queue
If it is less than, try again after a short sleep

Instance operations

root@redis:/usr/local/bin# redis-cli127.0.0.1:6379> zadd delayqueue 1 task1(integer) 1127.0.0.1:6379> zadd delayqueue 2 task2(integer) 1127.0.0.1:6379> zadd delayqueue 4 task4(integer) 1127.0.0.1:6379> zadd delayqueue 3 task3(integer) 1127.0.0.1:6379>127.0.0.1:6379> zrange delayqueue 0 0 withscores1) "task1"

Advantages:

Simple to implement
Tasks can be managed (delete, modify tasks)

Disadvantages:

A short polling thread is required to constantly determine whether the first element is out of date, causing the CPU to wasting
In distributed scenarios, it is easy to cause multiple nodes to read to the same task

Delayqueue Implementation

Delayqueue is a blockingqueue that is implemented using a priority queue, where the priority queue compares time and internally stores objects that implement the delayed interface. objects can be fetched from the queue only after the object expires.

Internal structure

Can be re-entered lock
Priority queue for sorting by delay time
Thread leader for optimizing blocking notifications
Condition object to implement blocking and notification

Leader/followers
Leader/followers is a mode in which multiple worker threads take turns to monitor, distribute, and process events. The biggest advantage of this mode is that it listens to events and handles customer requests, from receiving to processing is done in the same thread, so there is no need to pass data between threads to resolve the overhead of frequent thread switching.

The mode works at any point in time and only one thread becomes leader, responsible for event snooping, while other threads are follower, waiting to become leader in hibernation. The worker thread for this pattern exists in three states, and the worker thread can only be in one state at a time, and the three states are

Leading: Thread is in the leader state, responsible for event monitoring. After leader heard the incident, there are two ways to deal with it:

You can move to the processing state, handle the event yourself, and invoke a method to elect a new leader.
You can also specify additional follower to handle the event, at which time the leader state is unchanged.

Processing: line is impersonating is handling events, processing events If there is no leader in the current thread set, it will become the new leader, or be a follower.
Following: Threads are in a follower state, waiting to become new leaders may also be designated by leaders to handle new events.

Core Source Analysis:

入队public boolean offer(E e) { final ReentrantLock lock = this.lock;  lock.lock(); try {      q.offer(e); if (q.peek() == e) {//入队对象延迟时间是队列中最短的          leader = null;//重置leader          available.signal();//唤醒一个线程去监听新加入的对象      } return true;  } finally {      lock.unlock();  }}

出队public E take() throws InterruptedException { final ReentrantLock lock = this.lock;  lock.lockInterruptibly(); try { for (;;) {          E first = q.peek(); if (first == null)              available.await();//队列为空，无限等待 else { long delay = first.getDelay(TimeUnit.NANOSECONDS); if (delay <= 0)//延迟时间已过，直接返回 return q.poll(); else if (leader != null)//已有leader在监听了，无限等待                  available.await(); else {                  Thread thisThread = Thread.currentThread();                  leader = thisThread;//当前线程成为leader try {                      available.awaitNanos(delay);//在delay纳秒后唤醒                  } finally { if (leader == thisThread)// 入队一个最小延迟时间的对象时leader会被清空                          leader = null;                  }              }          }      }  } finally { if (leader == null && q.peek() != null)//leader不存在且队列不为空，唤醒一个follower去成为leader去监听          available.signal();      lock.unlock();

Advantages:

High efficiency, low task trigger time delay

Disadvantages:

Data is saved in memory and needs to be persisted by itself
Do not have distributed capability and need to achieve high availability

Not to be continued.

This article from the NetEase cloud community, by the author Zhiliang authorized release.

Delayed task scheduling System (technology selection and design)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Delay task scheduling system-Technology selection and design (previous)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Delay task scheduling system-Technology selection and design (previous)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support