Design and Implementation of distributed scheduled tasks

Last Update:2014-09-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Design and Implementation of distributed scheduled tasks Http://netkiller.github.io/journal/scheduler.htmlMr. Neo Chen (netkiller), Chen jingfeng (bg7nyt)

Xishan Meidi, Minzhi Street, Longhua new district, Shenzhen City, Guangdong Province, China
518131
+ 86 13113668890
+ 86 755 29812080
<[Email protected]>

Copyright Notice

For reprinting, contact the author. During reprinting, be sure to indicate the original source, author information, and this statement.

Document Source:

Http://netkiller.github.io

Http://netkiller.sourceforge.net

2014-09-30

Summary

This article describes distributed software development through the distributed Plan Task software design.

My Documents

Netkiller impact ect shouzha	Netkiller developer notebook	Netkiller PHP notebook	Netkiller Python notebook	Netkiller testing shouzha	Netkiller cryptography notebook
Netkiller Linux shouzha	Netkiller Debian shouzha	Netkiller centos notebook	Netkiller FreeBSD notebook	Netkiller shell shouzha	Netkiller security statement
Netkiller web shouzha	Netkiller monitoring shouzha	Netkiller storage shouzha	Netkiller mail shouzhi	Netkiller docbook notebook	Netkiller version
Netkiller database shouzhi	Netkiller PostgreSQL notebook	Netkiller MySQL notebook	Netkiller nosql notebook	Netkiller LDAP notebook	Netkiller network shouzha
Netkiller Cisco IOS notebook	Netkiller H3C notebook	Netkiller multimedia notebook	Netkiller Perl notebook	Netkiller Amateur Radio shouzha	Netkiller devops notebook

Directory

1. What is a distributed scheduled task?
2. Why use distributed scheduled tasks?
3. When to use distributed scheduled tasks
4. Deploy distributed scheduled tasks
5. Who will write distributed scheduled tasks?
6. How to implement distributed scheduled tasks
- 6.1. Distributed mutex lock
- 6.2. Queue
- 6.3. Miscellaneous

1. What is a distributed scheduled task?

First, let's explain how to schedule a task. A scheduled task is a scheduled or periodically running program. The most common examples are Linux "crontab" and Windows "scheduled task program ", we also often use them to implement our scheduled tasks because their time scheduling procedures are very mature and we don't need to develop another set.

2. Why use distributed scheduled tasks?

At first, we used the crontab scheduler like most people, but as the project grows and the system becomes more complex, many problems may arise.

The first is high availability HA requirements. When the server running the scheduled task fails, all the scheduled tasks will stop working.

The second is the performance problem. More and more large scheduler tasks are running intensive CPU/IO operations, and a single node cannot meet our needs.

There must be a set of effective solutions for the 365*24 * uninterrupted running of scheduled tasks. I realized that a new distributed scheduled task framework must be developed, in this way, developers do not need to pay attention to how to achieve distributed operation, just concentrate on writing tasks.

I first proposed that this framework must have the following features:

Distributed scheduled tasks must have the following features:

For failover, we must use at least two nodes. When a node fails, the other node automatically takes over the task through the Health Check Program.
Distributed running: a task can run on multiple nodes at the same time. It can adjust the running sequence and control concurrency and mutual exclusion.
Nodes can be dynamically adjusted. There are at least two nodes. You can add and detach nodes at any time.
State sharing: communication that a task may involve, such as State synchronization.

3. When to use distributed scheduled tasks and when to use distributed scheduled tasks

If you encounter performance problems and performance problems, you may first think of server splitting, but many applications do not have cross-server running capabilities.
High Availability: If one node fails, the other node will take over and continue running.
For disaster recovery, you can deploy two or more scheduled task nodes in two or more data centers. If the HA feature fails in any data center, other data centers will continue to run.

4. Deploy distributed scheduled tasks

Two nodes deployed

The two nodes can implement the "master" and "slave" schemes, the queue (Queue) running scheme and the parallel scheme. The parallel scheme is divided into different running and asynchronous running, and also involves mutex operation.

Deploy two or more nodes

For multiple nodes, we recommend that you use the queue running solution and parallel solution, but do not use the mutex parallel solution (wasting resources)

5. Who will write distributed scheduled tasks?

Once our distributed Plan Task framework is completed, the task writing is very easy, and you only need to inherit the Framework Program to have the features of distributed operation.

6. How to implement distributed scheduled tasks

A scheduled task is a complex one. It includes an operating system scheduled task, an application scheduled task, TCP/IP-based access, and command line-based access, scheduled execution, periodic execution, and triggering based on certain conditions. In short, the disaster recovery plan is much more complex than the Web, cache, and database.

Figure 1. Time-Sharing Solution

Strictly divide the time slice and run the scheduled tasks alternately. When the main system goes down, the standby system still works, but the processing cycle is extended. Disadvantage: the cycle is prolonged.

Figure 2. HA high-availability solution

Under normal circumstances, the main system works, the Standby System is waiting, and heartbeat detection finds that the main system is faulty, and the standby system is started traditionally. Disadvantages: single system, not load balancing, only Vertical Scaling (hardware upgrade), not horizontal scaling

Figure 3. Multi-path heartbeat Scheme

The above HA is implemented based on the VIP Technology in three layers. In the following solution, I use multiple heartbeats for service-level, process-level, IP-and port-level heartbeat detection, under normal circumstances, the main system is working, the Standby System is waiting, heartbeat detection finds that the main system is faulty, and the standby system is started traditionally. When the main system is detected again, the execution right is handed back to the main system. disadvantages: complicated development and high program Robustness Requirements

Figure 4. Task preemption plan

A and B servers work at the same time. when starting a server before or after the previous one, who is the first to start and who is the first to lock, other servers can only wait, And they monitor the mutex lock at the same time, once the lock is found to be released, the exclusive lock will be applied before the operation of other services. Advantage: it can be further optimized to achieve horizontal expansion of multiple servers. Disadvantages: complicated development, high program Robustness Requirements, and sometimes the lock will not be released.

Figure 5. Task Round Robin or task round robin + preemption queuing Scheme
Task Round Robin or task round robin + preemption queuing Scheme

Each server is added to the queue when it is started for the first time.
Each task runs first to determine whether it is the current task that can be run.
Otherwise, check whether you are in the queue. If you are in the queue, release the service. If not, add the service to the queue.

6.1. Distributed mutex lock

Exclusive locks are also called exclusive locks. They are used to manage multiple processes or multithreading at the same time in concurrency. Only one process or thread can operate on one function at a time. If you understand what a mutex lock is, it is easy to understand distributed locks.

We extend the locks in processes and threads to the Internet to lock or unlock processes or threads running on a node. In this way, you can control the concurrency of processes or threads on the node.

+------------------+                             +------------------+| Server A         |                             | Server B         |+------------------+      +---------------+      +------------------+| Thread-1         |      | Cluster Mutex |      | Thread-1         || Thread-2         |----> +---------------+ <----| Thread-2         || Thread-3         |      | A Thread-2    |      | Thread-3         |+------------------+      +---------------+      +------------------+                                  |                                  V                          +---------------+                          | Cluster Mutex |                           +---------------+                          | A Thread-2    |                          +---------------+

There are two servers running tasks. Thread-2 of server a performs the lock operation, and other programs must wait for it to release the lock to run.

You will ask what if server a goes down, will it remain locked? My answer is that each lock has a timeout threshold, which is automatically unlocked once the lock times out.

In addition, we also need to consider the "Domain" issue. You can also call it a command space, mainly to prevent the lock from being overwritten by the same name.

6.2. Queue

Queuing

+------------------+                             +------------------+| Server A         |                             | Server B         |+------------------+      +---------------+      +------------------+| Thread-1         |      | Task Queue A  |      | Thread-1         || Thread-2         |----> +---------------+ <----| Thread-2         || Thread-3         |      | A Thread-2    |      | Thread-3         |+------------------+      | B Thread-1    |      +------------------+                          | B Thread-3    |                          | A Thread-3    |                          +---------------+                                  |                                  | <sync>                                  V                          +---------------+                          | Task Queue B  |                          +---------------+                          | A Thread-2    |                          | B Thread-1    |                          | B Thread-3    |                          | A Thread-3    |                          +---------------+

From this, we can see that the queue in the task queue is running from top to bottom.

Note that task queue requires two nodes, which are in the master-slave structure. node A synchronizes the sh status to Node B in real time. If node A fails, Node B immediately replaces node.

6.3. Miscellaneous

Scheduled tasks can run in a distributed manner, but they cannot be guaranteed. Other servers need to be adjusted. Such as databases and caches.

Design and Implementation of distributed scheduled tasks

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Design and Implementation of distributed scheduled tasks

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Design and Implementation of distributed scheduled tasks

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support