A simple, efficient, and general asynchronous Task Processing System Based on Spring

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background

With the continuous addition of application system functions, the real-time requirements for some functions are not so high, but the logic is complicated and the execution time is relatively long, for example, external system calls and multiple data sources are involved. In this case, we hope that these complex business logic can be executed in the background, and the interaction between the foreground and users does not have to wait, this improves the user experience;

In addition, from the perspective of system architecture, we also hope to split the data according to different functions to ensure low coupling between systems. When a system encounters a problem, it will not affect other systems, we can optimize and monitor the performance of independent systems. Therefore, we need a general and efficient asynchronous task processing system;

Design Objectives

BuildLightweight, simple, efficient, universal, highly scalable, and highly reliableAsynchronous Task Processing System!

System Design

To implement a similar asynchronous processing system, we believe that the first thing we think of is JMS. There is also a JMS-based Asynchronous processing system in Alibaba, which is widely used in online shop systems; however, because Alibaba software currently uses different technical frameworks, it cannot be used directly. Moreover, to achieve the concurrency of the asynchronous task system, the system adopts a policy combining JMS and MDB, therefore, the system depends on EJB, and the system becomes heavy. Therefore, the application server deployed by the system must support EJB, some lightweight application servers that do not support EJB specifications cannot be deployed;

Considering the above system design objectives, our design philosophy is:TaskDBPersistence + springEncapsulate a jobScheduling and thread pool;

LTask DBPersistence: We need to save the task information to be processed in ourTrustworthyDBIf the task has not reached tens of millions, it can be put together with the business dB to ensure that when our task processing server is down, these tasks that have not been executed successfully or are not started will not be lost;
LSpringEncapsulate a jobScheduling: When the task information is persistent in the DB, We need to read the information and perform specific business logic operations. Here we use scheduledexecutorfactorybean to implement cyclic scheduling of the task, for example, you can scan the list of tasks to be processed every 5 minutes. If there are records, extract them for execution. Of course, to implement more powerful task scheduling functions, the quartz open-source scheduling framework integrated in spring can be used;
LSpringEncapsulate Thread Pool: To improve the task execution efficiency, we must consider allowing specific operations of the task to be executed concurrently. In order to make the system more lightweight, here we use the default encapsulation implementation in Spring Based on the JDK thread pool, and adjust parameters through configuration;

For the system deployment diagram, see:

The following describes the specific system design:

First, we need to create two new tables to persist the task-related information. The following table structure and its SQL statements are based on Oracle. The table names can be retrieved by themselves, such as tasks/tasks_fail_history. The fields are identical, recommended fields include:

Field	Type	Description	Can be empty	Default Value
Task_id	Varchar2 (36)	@ Desc pk: a unique identifier. The default value is UUID.	Not
Gmt_create	Date	@ DESC creation date	Not
Gmt_handle	Date	@ DESC task to be executed	Not
Task_handler	Varchar2 (32)	@ Desc: Type of the task to be executed	Not
Load_balance_num	Number	@ DESC the Server Load balancer value obtained for the task to be executed Balance the pressure on multiple servers	Not	0
Task_params	Varchar2 (4000)	@ DESC parameters required for the task to be executed	Null
Retry_count	Number	@ Desc: Number of retries. 1 is added each time.	Not	0
Retry_reason	Varchar2 (512)	@ DESC retry reason, that is, the cause of the previous failure, which facilitates troubleshooting	Null

Table tasks is used to save all tasks to be executed. Each task information belongs to one type.Task TypeIs identified by the task_handler field. Because the core of the system is based on spring, we recommend that you set the bean ID of the task type in the spring container;

All parameters required for executing the task are provided by the task_params field. The strings in this field can be assembled by the application, as long as the specific task implementation class can be parsed;

The load_balance_num field is mainly used to meet the needs of many future tasks. When multiple servers are required to balance the pressure, it is equivalent to assigning a load balancing value to each task, different servers can process task information with different load balancing values. This field value must be evenly distributed across the entire table as much as possible. For example, there are 500 records in the entire table, among them, 1, 2 ,...... 10. The total number of tasks with each value is about 50;

Delete or update each task based on its execution status;

Table tasks_fail_history is mainly used to save the records of tasks that fail to be executed and require manual intervention. The records are from the tasks table. When the task retry times exceed a certain number of times, the task records are saved to the failure history table;

Secondly, we need to clarify the information of task producers and consumers:

ForTask producerThe required information includes the date, type, and parameters required for task execution;

Another optional field: load_balance_num. When the task consumer has multiple servers, this field can be used for distributed task processing. In this case, you can set a value for this field according to certain rules, for example, to generate a random number between 1 and 10, or to generate a value based on other self-designed rules, as long as the field value is evenly distributed in the whole table;

ForTask consumerThe general consumption process is as follows:

The following describes the specific logic of each process:

After the consumer service is started, two scheduling policies can be selected based on the configured scheduling policy (implemented through scheduledexecutorfactorybean built in spring): fixedrate, Which is scheduled every few minutes, regardless of whether or not the last scheduling has been completed, and the second is fixeddelay, that is, after each scheduling is completed, the same time is delay;) scan the tasks table to retrieve XX pieces of data, such as 1000, configurable;
The basic SQL statement is: Select * from tasks where gmt_handle <= sysdate;
Of course, according to different extension policies, the query conditions for each task table scan are also different, for example:
1. When there are few types of tasks to be executed and the number of tasks is not large, a single server can handle them, so the query SQL is:
 Select * from tasks where gmt_handle <= sysdate and rownum <= ?;
2. When the number of task types and tasks increases, a single server cannot solve the problem. In this case, we need to consider linear expansion of the consumer server. At this time, there are different expansion policies available:
 1. If the horizontal scaling policy of the function enables different consumer servers to execute different task types, the query SQL condition is:
 Where gmt_handle <= sysdate and task_handler in (?) And rownum <= ?;
 2. If the policy is expanded horizontally by stress, the pressure on each consumer server should be maintained evenly to avoid busy servers while idle servers; in the previous horizontal scaling policy, the server is busy to varying degrees. if this policy is adopted, each consumer server may process multiple types of tasks, the SQL query conditions are as follows:
 Where gmt_handle <= sysdate and load_balance_num in (?) And rownum <= ?;
 3. In addition to the scaling policies based on the two independent dimensions above, the two can also be used in combination. This can be applied to horizontal scaling based on functions, however, some task types cannot be implemented on a single server. In this case, you need to horizontally scale these special task types according to the pressure. In this case, the SQL query condition is:
 Where gmt_handle <= sysdate and task_handler in (?) And load_balance_num in (?) And rownum <= ?;
 4. If the keyword "in" is used in the query SQL statements of the preceding tasks, some people may worry about query performance. In fact, you don't have to worry about it because the types of tasks and the number of task servers we process are not too large, hundreds of task types are estimated to be the most, and index is also used for in statement queries, and the auxiliary restrictions of rownum are used. Therefore, you do not need to worry about SQL Execution efficiency. In addition, if the task type is small, the in SQL can be replaced by =;
For each record queried from the database, put the record ID in the local cache (static variable can be done, but concurrent processing is required, find the corresponding processing bean instance based on the task_handler field value in the record in the spring container and throw it to the spring asynchronous thread pool for execution;
The specific processing class returns results after the task is processed, and then the task system updates the records in the tasks table corresponding to the record based on the returned results (increasing the number of retries, set the next execution time based on the retry Policy) or delete the record (successful execution). Clear the record ID in the cache to avoid infinite expansion of the cache;
According to the scheduling rules, when the next execution time is reached, the rule in step 1 is used again to scan the tasks table, repeating the processing logic above, the difference is that, before the task is processed by a specific task_handler, the system first queries the local cache to see if the record is being processed. If the record already exists in the cache, no processing is required; this is mainly to avoid repeated and concurrent execution of some time-consuming tasks;
For retries after a failure, set a retry policy that can be configured with different delay times each time. For example, retry 1 minute after the first failure and 5 minutes after the second failure, try again 20 minutes after the third failure... After more than X failures, move the record to the History Table and send an email to the alarm;

Detailed Design

For the above system design, we can plan a general class diagram, you can refer to the following implementation:

The usage of several core classes involved in the class diagram can be referred to the following spring configuration information:

<! -- The task is loaded from here --> <bean id = "yyspringscheduledexecutorfactorybean" class = "org. springframework. scheduling. concurrent. scheduledexecutorfactorybean "> <property name =" scheduledexecutortasks "> <list> <ref bean =" notifyspringscheduledexecutortask "/> </List> </property> </bean> <! -- List of tasks to be scheduled by spring schedual --> <bean id = "yyspringscheduledexecutortask" class = "org. springframework. scheduling. concurrent. scheduledexecutortask "> <property name =" runnable "ref =" policyscheduledmainexecutor "/> <! -- The delay time of the first task, in ms. The default value is 0, indicating that the task is executed immediately when the task is loaded for the first time; for example, 1 min --> <property name = "delay" value = "60000"/> <! -- Interval, in ms. The default value is 0, indicating that the task is only executed once; for example, 2 min --> <property name = "Period" value = "120000"/> <! -- Whether to use fixedrate for task scheduling. The default value is false, that is, fixeddelay --> <! -- Fixedrate: scheduled interval execution, whether or not the last task has been completed; fixeddelay: fixed delay time after each task is completed --> <property name = "fixedrate" value = "true"/> </bean> <! -- Main task scheduling thread --> <bean id = "yyscheduledmainexecutor" class = "com. alisoft. AEP. Policy. schedual. policyscheduledmainexecutor"> <! -- For the services on the notify server, used to update retry information of notify --> <property name = "yyserverservice" ref = "yyserverservice"/> <! -- Policy. policyid Cache Policy Implementation class, which can be expanded on its own --> <property name = "yyidcachestrategy" ref = "defaultpolicyidcachestrategy"/> <! -- The notify. load_balance_num field value generation and the policy implementation class of the value in the where condition during scheduling can be expanded on its own --> <! -- This function is useful when multiple y servers exist. It is used to balance the pressure on each server. Generally, no configuration is required. --> <! -- <Property name = "loadbalancenumstrategy" ref = "alternateloadbalancenumstrategy"/> --> <! -- The Policy Implementation class that specifies the value of the policy. Handler field in the where condition during scheduling, which can be expanded on its own --> <! -- This function is useful when multiple slave y servers exist. It is used to indicate which handler can be executed by a server. Generally, no configuration is required. --> <! -- <Property name = "yyhandlerstrategy" ref = "defaultpolicyhandlerstrategy"/> --> <! -- This function is useful when multiple slave y servers exist. It is used to set the maximum number of slave y that is read each time when a server is scheduled. It is used to overwrite maxnum. Generally, no configuration is required. --> <! -- <Property name = "yymaxnumperjobstrategy" ref = "defaultnotifymaxnumperjobstrategy"/> --> <! -- Used for concurrent thread pools --> <property name = "yytaskexecutor" ref = "yytaskexecutor"/> <! -- Maximum number of records of notify read by each scheduling, the default value is 1000 --> <property name = "maxnum" value = "1000"/> <property name = "policydao" ref = "policydao "/> </bean> <! -- Asynchronous thread pool --> <bean id = "yytaskexecutor" class = "org. springframework. Scheduling. Concurrent. threadpooltaskexecutor"> <! -- Number of core threads; default value: 1 --> <property name = "corepoolsize" value = "10"/> <! -- Maximum number of threads. The default value is integer. max_value --> <property name = "maxpoolsize" value = "50"/> <! -- Maximum queue length. Generally, you must set a value> = policyscheduledmainexecutor. maxnum; The default value is integer. max_value --> <property name = "queuecapacity" value = "1000"/> <! -- The idle time allowed by the thread pool to maintain the thread, the default value is 60 s --> <property name = "keepaliveseconds" value = "300"/> <! -- The thread pool only supports the abortpolicy and callerrunspolicy processing policies for rejected tasks; the default value is the latter --> <property name = "rejectedexecutionhandler"> <! -- Abortpolicy: Java. util. Concurrent. rejectedexecutionexception thrown directly --> <! -- Callerrunspolicy: the main thread executes the task directly. After the task is executed, it tries to add the next task to the thread pool. This effectively reduces the speed of adding a task to the thread pool. --> <! -- Discardoldestpolicy: The old task is discarded and is not supported at the moment. The discarded task cannot be executed again. --> <! -- Discardpolicy: discard the current task, which is not supported at the moment, and the discarded task cannot be executed again --> <Bean class = "Java. util. concurrent. threadpoolexecutor $ callerrunspolicy "/> </property> </bean> <bean id =" yyserverservice "class =" com. alisoft. AEP. policy. service. impl. notifyserverserviceimpl "> <! -- Policy Implementation class for how to retry y after a task fails to be executed, scalable --> <property name = "policyretrystrategy" ref = "defaultpolicyretrystrategy"/> <! -- For the implementation class of the exception handling policy after the task fails to be executed, you can expand it on your own --> <! -- The exception is not remedied by default. If null is returned or an exception is thrown in the handler implementation class, the notify record is directly migrated to the History Table Based on Exception Handling without retrying; --> <! -- <Property name = "yyhandlerexceptionstrategy" ref = "defaultpolicyhandlerexceptionstrategy"/> --> <! -- For the description, see policyscheduledmainexecutor --> <property name = "notifyidcachestrategy" ref = "defaultpolicyidcachestrategy"/> <! -- Transaction template, ensure that the corresponding bean is found --> <property name = "transactiontemplate" ref = "transactiontemplate"/> <property name = "yydao" ref = "yydao"/> </bean>

Do you want to achieve the design goal?

LLightweight: The core implementation is completely based on the spring and Dao layers and can decide the framework to adopt. It can be deployed in any Web Container. This is also the biggest improvement compared with the JMS system;
LSimple: For task producers, you only need to insert records into the tasks table without introducing any other communication protocols;
For task consumers, because the system only depends on spring, it is very easy to integrate the system with existing systems: Introduce the jar package, add the ibatis and spring configuration files to the load list of your system;
In addition, the job scheduling policy is set based on Spring schedual, with fewer configuration files than quartz;
LEfficient: If the fixedrate scheduling method is adopted, the system's processing capability can be accurately calculated. For example, if 1000 pieces of data are extracted every 1 minute, the processing capability of a single server in one day is 144; of course, you need to consider the specific time consumption of each task, because the system may not be able to process 1000 data entries within 1 minute;
If the fixeddelay scheduling method is adopted, the system's processing capability is completely based on the specific execution time of the task, because when the scheduling mode is set to 1 s after each scheduling is completed, in fact, it is equivalent that the system has been processing tasks, so as to maximize the system utilization;
Some people may wonder if performance problems occur when multiple consumer servers query the tasks table? In fact, after our system operation experience, this problem does not exist, because the records of this table will be deleted after the execution is successful, so the data volume of this table will not be too large, unless the consumer service is down in a large area, but this is a rare case. In this case, when the consumer service is started again, the system will be under pressure, but it will not be too large, this is because the first XX items are obtained for each query to be executed, and indexes can be set up for assistance;
LGeneral: This system only implements the core asynchronous processing function, but has nothing to do with the specific business logic. The system loads the specific business logic implementation according to task_handler; the specific handler implementation only needs to implement the corresponding interface and add bean configuration in spring;
LExtension: Distributed linear expansion can be achieved based on any field in the task_handler/load_balance_num table in the tasks table or by combining the two fields. They correspond to two different distributed linear expansion strategies respectively; this is completely transparent to the client, and the task producer only needs to configure different policies during insertion; in addition, you can reasonably use these two policies to achieve the consumer service that is already running when the new task type is added without re-publishing;
LReliable: Because the information of the tasks to be executed is stored in the reliable dB maintained by us, when our consumer service is down, the unprocessed task information will not be lost, compared with some memory database timing persistence solutions based on JMS server, compared with the stability of business dB, the reliability is not a level;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A simple, efficient, and general asynchronous Task Processing System Based on Spring

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A simple, efficient, and general asynchronous Task Processing System Based on Spring

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support