Delay task scheduling system-Technology selection and design (next)

Last Update:2018-08-17 Source: Internet

Author: User

Tags message queue zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article comes from NetEase cloud community.

Time Wheel Implementation

A time wheel is a circular data structure, divided into multiple lattices.
Each lattice represents a period of time, the shorter the time, the higher the accuracy.
Each grid is saved with a list of expired tasks in that grid.
The pointer rotates over time and executes the expiration task in the corresponding grid.

Noun Explanation:

Time Lattice: a block in a circular structure for storing delayed tasks
Pointer: A time grid that points to the current operation, representing the current time
Number of cells: the number of time lattices in the time wheel
Interval: The interval between each time lattice, which represents the precision that the time wheel can achieve
Total interval: The total interval of the current time, equal to the number of cells * interval, which represents the time range that the time wheel can express

Single-Watch Time wheel

For example, if a lattice is 1 seconds, then the entire time wheel can represent the time period of 8s, if the current pointer to 2, at this time need to dispatch a 3s after the execution of the task, need to put in the 5th lattice (2+3), the pointer will be transferred 3 times can be executed.

The problem with the single-table time wheel is:
Lattice is limited in number, can represent a limited amount of time, when to store a 10s after the expiration of the task what to do? This can cause a time-wheel overflow.
One way to do this is to save the turn information to the task on the timeline list.

If the task is to be executed after 10s, calculate the round 10/8 round and so on 1, lattice 10%8 equals 2, so put in the second grid.
When checking for overdue tasks, you should only perform tasks with round 0, and round of other tasks in the list minus 1.
The problem with the round single-table time wheel is:
If the time span of the task is large, the number is large, the single-layer time wheel will cause the round of the task is very large, a single lattice of the list is very long, the amount of each check is very large, will do a lot of invalid check. What to do?

Layered Time Wheel

Overdue tasks must be performed in the underlying wheel, and tasks in other time wheels will continue to degrade into the lower tier of the time wheel as they approach expiration.
Each wheel in a layered time wheel has its own lattice number and interval setting, and when the lowest layer of time is rotated round, the time wheel of the first layer turns a lattice.
The layered time wheel greatly increases the range of time that can be represented, while reducing space consumption.

As an example:
The layered time wheel can express 8 8 8=512s time range, if the use of a single-table time round may require 512 lattice, while the layered time wheel as long as the 8+8+8=24 lattice, if the design of a time range is 1 days of layered time round, three rounds of the grid with 24, 60, 60.

Working principle:
There are two ways in which the time wheel pointer rotates:

According to their own interval rotation (second wheel 1 seconds to 1, the minute wheel 1 minutes to 1 grid; clock wheel 1 hours to 1)
Push through the lower time wheel (1 laps in seconds, rotate 1 in minutes, rotate 1 times in minutes, rotate 1 in a clock)

There are two ways to handle a pointer when it goes to a particular lattice:

If it is the bottom wheel, the pointer points to the elements on the linked list in the grid indicating expiration
If it is another wheel, move the task on the grid to a time wheel of fine precision, such as the task of the clock wheel moving to the minute wheel

As an example:

Tasks performed after adding 1 5s

Figure out the task should be placed in the second round of the 5th lattice
The task will be executed after the second wheel hand is rotated 5 times.

Add a task to perform after 50s

calculates that the delay time of the task has overflowed the second round
50/8=6, so the task will be saved in the minute wheel of the 6th lattice
After 6 laps (6*8s=48s) in the second wheel, the hands of the minute wheel point to the 6th lattice
At this point the task in the grid is downgraded to the second wheel, and according to 50%8=2, the task is moved to the 2nd lattice of the second wheel.
After the second wheel hand is rotated 2 times (50s) The task will be executed

Add a task to perform after 250s

calculates that the delay time of the task has overflowed the minute wheel
250/8/8=3, so the task will be saved in the 3rd lattice of the clock wheel
After 3 laps (3*64s=192s) in the minute wheel, the hand of the clock wheel points to the 3rd lattice
At this point the task in the grid is downgraded to the minute wheel, and according to (250-192)/8=7, the task will be moved to the 7th lattice of the minute wheel.
After 7 laps (7*8s=56s) in the second wheel, the hands of the minute wheel point to the 7th lattice
At this point the task in the grid is downgraded to the second wheel, and according to (250-192-56) = 2, the task will be moved to the 2nd lattice of the second wheel
The task will be executed after the second wheel hands are rotated 2 times.

Advantages:

High performance (Insert task, delete task time complexity is O (1), delayqueue due to sorting, insertion and removal of complexity is O (LOGN))

Disadvantages:

Data is saved in memory and needs to be persisted by itself
Do not have distributed capability and need to achieve high availability
Delayed task expiration is limited by the total time round interval

For out-of-scope tasks can be placed in a buffer (available queue, Redis or database implementation), and so on the highest time to rotate to the next grid to remove the matching range of tasks from the buffer falls into the time wheel.

For example:

Add a 600s post-execute task A

calculates that the delay time of the task has overflowed the time wheel
So the task is saved to the buffer queue
After the clock wheel has gone 1 grid, the task of satisfying the range from the buffer queue falls into the time wheel.
All task delay time in the buffer queue is subtracted by 64s, task a minus 64s is 536s, still greater than the time wheel range, so it will not be moved out of the queue
After the clock wheel has gone another 1, task a minus 64s is 536-64=472s, and within the time wheel range, it will fall into the clock wheel

Previous design (Db/delayqueue/zookeeper)

Dispatch system provides the task operation interface for the business system to submit tasks, cancel tasks, feedback execution results and so on.
For Dubbo calls, the task is abstracted into a jobcallbackservice interface, implemented by the business system and registered as a service.

Overall architecture

Database:

Responsible for saving all the task data

Memory Queue:

The mechanism by which the delay task is actually triggered by delayqueue is guaranteed by it.
Store up to 1000 tasks expiring in the next n minutes only

ZooKeeper:

Manage the entire scheduling cluster
Storage scheduling Node Information
Storage Node Shard information

Master node:

Re-sharding the data when there are new nodes on the upper and lower lines

Scheduling node:

Provide Dubbo, HTTP interface for business system calls, for submitting tasks, canceling tasks, feedback execution results, etc.
Obtain shard information for the current node from the ZK registry, and then pull the expiring data from the database to Delayqueue
Invoke the callback service interface registered by the business system to initiate a dispatch request
Receive feedback from business systems, update execution results, remove tasks or initiate retries

Business System:

The callback interface Jobcallbackservice is required as a scheduled service and is registered as a Dubbo service provider
Call the dispatch System interface operation task in scenarios that require deferred tasks

Database design

Table description

Job_callback_service: Service Configuration table, configuring the business callback service, including service protocol, callback service, retry count
Job_delay_task: Delay task table for storing deferred tasks, including task Shard number, callback service, total number of calls, number of failures, task status, callback parameters, etc.
Job_delay_task_execlog: Delay the task execution table and record every callback initiated by the dispatch system
Job_delay_task_backlog: Delay Task Scheduling results table, record the final status of the task and other information

Master-Slave switching
Using the Zookeeper Temporal sequence node attribute, the node with the smallest ordinal number is the primary node, and the other node is the slave node.
The primary node listens to the cluster state and re-shards when the cluster state changes.
From the node listening serial number than its small sibling nodes, the sibling node changes to re-find and establish a monitoring relationship.

Data sharding

Task status

Delay: The initial state after a deferred task is committed
Ready: The status of the message pushed into the readiness queue when the expiration time has passed
Running: The status of the business subscription message that receives the message to begin processing
Finished: Business Process success
Failed: Business processing failed

Main process

Service Load

Read service configuration from DB
Dynamically constructs consumer objects according to configuration and adds them to the spring container

Submit a Task

Business systems submit tasks via Dubbo or HTTP interfaces
Determine if the task expiration time is within a scan cycle
If it is,

Set the Shard number (random fetch from The Shard responsible for the current node)
Add to Memory queue
Tasks saved to Job_delay_task table

If no,

Set the Shard number (calculate the Shard number based on the total number of shards and the stochastic algorithm)
Tasks saved to Delay_task table

Timer

Managed by a single thread
Set the timer's execution cycle based on the configured scan interval
Calculates the expiration time for this period based on the current time and the scan interval x-delay
Get the expiration time from the DB All tasks before X-delay, and put it to delayqueue

scheduling tasks

Managed by a single thread pool
All threads are blocked in the Delayqueue method take
Take the task, get the task from the DB, and determine if there is
If not, do nothing (the task has been executed successfully or has been deleted)
If present, determine if the number of calls exceeds the set
If not super

Invoking the business callback service

Remove the invoked service configuration from the task
Get the corresponding consumer object from the container
Calling the business callback service asynchronously

Set the next retry time, logging the call log Job_delay_task_execlog

If over, move the task to Job_delay_task_backlog

Task Feedback

Update task Invocation Results

Advantages

Full-featured, highly available, easy to scale, and retry

Disadvantages

Slightly complex

Need to dynamically make service configuration a consumer object
Adding a new service requires notifying all scheduled node refreshes
There is a certain coupling (direct invocation of business Services, protocol coupling), if the access system is the thrift protocol?

Retry to process the task
Dispatch system directly callback business services, if the business service is not available may cause blind retry, not good control of traffic (scheduling system does not know the processing capacity of business services)

If you introduce MQ, use MQ to decouple the protocol from the service invocation, to ensure that the task is retried, and that the consumer will be better able to control traffic based on its own processing power?

Another scenario (DB/DELAYQUEUE/ZOOKEEPER/MQ)

Overall architecture

Database design

Main process

scheduling tasks

Managed by a single thread pool
All threads are blocking the Take method in the Delayqueue
Take the task, get the task from the DB, and determine if there is
If not, do nothing (the task has been executed successfully or has been deleted)
If present, transfer the task to Job_delay_task_execlog; Post message to Message queue

Disadvantages
Requires that the business system relies on MQ

This article from the NetEase cloud community, by the author Zhiliang authorized release.

Delayed task scheduling System (technology selection and design)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More