Design and Implementation of distributed scheduled tasks

Source: Internet
Author: User
Design and Implementation of distributed scheduled tasks Http://netkiller.github.io/journal/scheduler.htmlMr. Neo Chen (netkiller), Chen jingfeng (bg7nyt)


Xishan Meidi, Minzhi Street, Longhua new district, Shenzhen City, Guangdong Province, China
518131
+ 86 13113668890
+ 86 755 29812080
<[Email protected]>

Copyright? 2014 http://netkiller.github.io

Copyright Notice

For reprinting, contact the author. During reprinting, be sure to indicate the original source, author information, and this statement.

Document Source:
Http://netkiller.github.io
Http://netkiller.sourceforge.net

2014-09-30

Summary

This article describes distributed software development through the distributed Plan Task software design.

My Documents
Netkiller impact ect shouzha Netkiller developer notebook Netkiller PHP notebook Netkiller Python notebook Netkiller testing shouzha Netkiller cryptography notebook
Netkiller Linux shouzha Netkiller Debian shouzha Netkiller centos notebook Netkiller FreeBSD notebook Netkiller shell shouzha Netkiller security statement
Netkiller web shouzha Netkiller monitoring shouzha Netkiller storage shouzha Netkiller mail shouzhi Netkiller docbook notebook Netkiller version
Netkiller database shouzhi Netkiller PostgreSQL notebook Netkiller MySQL notebook Netkiller nosql notebook Netkiller LDAP notebook Netkiller network shouzha
Netkiller Cisco IOS notebook Netkiller H3C notebook Netkiller multimedia notebook Netkiller Perl notebook Netkiller Amateur Radio shouzha Netkiller devops notebook
Directory
  • 1. What is a distributed scheduled task?
  • 2. Why use distributed scheduled tasks?
  • 3. When to use distributed scheduled tasks
  • 4. Deploy distributed scheduled tasks
  • 5. Who will write distributed scheduled tasks?
  • 6. How to implement distributed scheduled tasks
    • 6.1. Distributed mutex lock
    • 6.2. Queue
    • 6.3. Miscellaneous
1. What is a distributed scheduled task?

First, let's explain how to schedule a task. A scheduled task is a scheduled or periodically running program. The most common examples are Linux "crontab" and Windows "scheduled task program ", we also often use them to implement our scheduled tasks because their time scheduling procedures are very mature and we don't need to develop another set.

2. Why use distributed scheduled tasks?

At first, we used the crontab scheduler like most people, but as the project grows and the system becomes more complex, many problems may arise.

The first is high availability HA requirements. When the server running the scheduled task fails, all the scheduled tasks will stop working.

The second is the performance problem. More and more large scheduler tasks are running intensive CPU/IO operations, and a single node cannot meet our needs.

There must be a set of effective solutions for the 365*24 * uninterrupted running of scheduled tasks. I realized that a new distributed scheduled task framework must be developed, in this way, developers do not need to pay attention to how to achieve distributed operation, just concentrate on writing tasks.

I first proposed that this framework must have the following features:

Distributed scheduled tasks must have the following features:
  1. For failover, we must use at least two nodes. When a node fails, the other node automatically takes over the task through the Health Check Program.

  2. Distributed running: a task can run on multiple nodes at the same time. It can adjust the running sequence and control concurrency and mutual exclusion.

  3. Nodes can be dynamically adjusted. There are at least two nodes. You can add and detach nodes at any time.

  4. State sharing: communication that a task may involve, such as State synchronization.

3. When to use distributed scheduled tasks and when to use distributed scheduled tasks
  1. If you encounter performance problems and performance problems, you may first think of server splitting, but many applications do not have cross-server running capabilities.

  2. High Availability: If one node fails, the other node will take over and continue running.

  3. For disaster recovery, you can deploy two or more scheduled task nodes in two or more data centers. If the HA feature fails in any data center, other data centers will continue to run.

4. Deploy distributed scheduled tasks

Two nodes deployed

The two nodes can implement the "master" and "slave" schemes, the queue (Queue) running scheme and the parallel scheme. The parallel scheme is divided into different running and asynchronous running, and also involves mutex operation.

Deploy two or more nodes

For multiple nodes, we recommend that you use the queue running solution and parallel solution, but do not use the mutex parallel solution (wasting resources)

5. Who will write distributed scheduled tasks?

Once our distributed Plan Task framework is completed, the task writing is very easy, and you only need to inherit the Framework Program to have the features of distributed operation.

6. How to implement distributed scheduled tasks

A scheduled task is a complex one. It includes an operating system scheduled task, an application scheduled task, TCP/IP-based access, and command line-based access, scheduled execution, periodic execution, and triggering based on certain conditions. In short, the disaster recovery plan is much more complex than the Web, cache, and database.

Figure 1. Time-Sharing Solution

Strictly divide the time slice and run the scheduled tasks alternately. When the main system goes down, the standby system still works, but the processing cycle is extended. Disadvantage: the cycle is prolonged.

Figure 2. HA high-availability solution

Under normal circumstances, the main system works, the Standby System is waiting, and heartbeat detection finds that the main system is faulty, and the standby system is started traditionally. Disadvantages: single system, not load balancing, only Vertical Scaling (hardware upgrade), not horizontal scaling

Figure 3. Multi-path heartbeat Scheme

The above HA is implemented based on the VIP Technology in three layers. In the following solution, I use multiple heartbeats for service-level, process-level, IP-and port-level heartbeat detection, under normal circumstances, the main system is working, the Standby System is waiting, heartbeat detection finds that the main system is faulty, and the standby system is started traditionally. When the main system is detected again, the execution right is handed back to the main system. disadvantages: complicated development and high program Robustness Requirements

Figure 4. Task preemption plan

A and B servers work at the same time. when starting a server before or after the previous one, who is the first to start and who is the first to lock, other servers can only wait, And they monitor the mutex lock at the same time, once the lock is found to be released, the exclusive lock will be applied before the operation of other services. Advantage: it can be further optimized to achieve horizontal expansion of multiple servers. Disadvantages: complicated development, high program Robustness Requirements, and sometimes the lock will not be released.

Figure 5. Task Round Robin or task round robin + preemption queuing Scheme
Task Round Robin or task round robin + preemption queuing Scheme
  1. Each server is added to the queue when it is started for the first time.

  2. Each task runs first to determine whether it is the current task that can be run.

  3. Otherwise, check whether you are in the queue. If you are in the queue, release the service. If not, add the service to the queue.

6.1. Distributed mutex lock

Exclusive locks are also called exclusive locks. They are used to manage multiple processes or multithreading at the same time in concurrency. Only one process or thread can operate on one function at a time. If you understand what a mutex lock is, it is easy to understand distributed locks.

We extend the locks in processes and threads to the Internet to lock or unlock processes or threads running on a node. In this way, you can control the concurrency of processes or threads on the node.

+------------------+                             +------------------+| Server A         |                             | Server B         |+------------------+      +---------------+      +------------------+| Thread-1         |      | Cluster Mutex |      | Thread-1         || Thread-2         |----> +---------------+ <----| Thread-2         || Thread-3         |      | A Thread-2    |      | Thread-3         |+------------------+      +---------------+      +------------------+                                  |                                  V                          +---------------+                          | Cluster Mutex |                           +---------------+                          | A Thread-2    |                          +---------------+

There are two servers running tasks. Thread-2 of server a performs the lock operation, and other programs must wait for it to release the lock to run.

You will ask what if server a goes down, will it remain locked? My answer is that each lock has a timeout threshold, which is automatically unlocked once the lock times out.

In addition, we also need to consider the "Domain" issue. You can also call it a command space, mainly to prevent the lock from being overwritten by the same name.

6.2. Queue

Queuing

+------------------+                             +------------------+| Server A         |                             | Server B         |+------------------+      +---------------+      +------------------+| Thread-1         |      | Task Queue A  |      | Thread-1         || Thread-2         |----> +---------------+ <----| Thread-2         || Thread-3         |      | A Thread-2    |      | Thread-3         |+------------------+      | B Thread-1    |      +------------------+                          | B Thread-3    |                          | A Thread-3    |                          +---------------+                                  |                                  | <sync>                                  V                          +---------------+                          | Task Queue B  |                          +---------------+                          | A Thread-2    |                          | B Thread-1    |                          | B Thread-3    |                          | A Thread-3    |                          +---------------+

From this, we can see that the queue in the task queue is running from top to bottom.

Note that task queue requires two nodes, which are in the master-slave structure. node A synchronizes the sh status to Node B in real time. If node A fails, Node B immediately replaces node.

6.3. Miscellaneous

Scheduled tasks can run in a distributed manner, but they cannot be guaranteed. Other servers need to be adjusted. Such as databases and caches.

Design and Implementation of distributed scheduled tasks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.