Implementation of multi-task system Watchdog

Source: Internet
Author: User

The guard dog is divided into hardware guard dog and software guard dog. The Hardware Watchdog uses a timer circuit, and its timed output is connected to the reset end of the circuit. The program clears the timer within a certain period of time (commonly known as "dog Feed"), so when the program works normally, the timer cannot always overflow, so it cannot generate a reset signal. If the program fails and the watchdog is not reset within the scheduled period, the watchdog timer overflows to generate a reset signal and restart the system. The principle of the Software Watchdog is the same. It only replaces the timer on the hardware circuit with the Internal timer of the processor. This can simplify the design of the hardware circuit, but it is not as reliable as the hardware timer, for example, the system's Internal timer itself fails to be detected. Of course, dual timers are also used to monitor each other. This not only increases the system overhead, but also cannot solve all problems. For example, if a system failure is interrupted, the timer interruption may fail.

The watchdog itself is not used to solve system problems. You should check and modify the design errors for the faults found during the debugging process. The purpose of adding a watchdog is to automatically restore the normal operating status of the system when the system crashes due to potential program errors and severe environmental interference. The watchdog cannot completely avoid the loss caused by the fault. After all, the system will be shut down during the period from fault discovery to recovery from system reset. At the same time, some systems also need to protect the on-site data before resetting, and restore the on-site data after restarting, which may also require hardware and software overhead.

Figure 1: (a) multi-task system watchdog; (B) Corresponding watchdog reset logic diagram.

In a single task system, the working principle of the watchdog is described above, which is easy to implement. In a multitasking system, the situation is a little complicated. If every task is done like a single task system, as shown in 1 (a), the watchdog timer will not overflow as long as a task works normally and regularly "feeds the dog. Only when all tasks fail can the watchdog timer overflow be reset, 1 (B ).

What we often need is that the system requires a reset as long as there is a task failure. You can also select several key tasks for monitoring. If a task has a problem, the system requires a reset, as shown in 2 (a). The corresponding watchdog reset logic is shown in 2 (B.

In a multi-task system, create a monitoring task taskmonitor with a higher priority than the monitored task groups task1 and task2... taskn. Taskmonitor in Task 1 ~ When taskn is working normally, the hardware watchdog timer is cleared within a certain period of time. If a task_x fault occurs in the monitored Task Group, taskmonitor will not reset the watchdog timer, which will automatically restart the system when the monitored task fails. In addition, when the task taskmonitor itself fails, it cannot promptly reset the watchdog timer, and the watchdog can also be automatically reset and restarted. Next, we need to solve the following problem: how to effectively monitor the monitored task group.

Figure 2: (a) multi-task system watchdog; (B) Correct watchdog reset logic diagram.

Define a group of struct in taskmonitor to simulate the watchdog timer group,

Typedef struct

{

Uint32 curcnt, lastcnt;

Bool runstate;

Int taskid;

} Struct_watch_dog;

This struct includes the task ID to be monitored. It is used to simulate the variable curcnt and lastcnt of "dog Feed" (see the following for details). The watchdog status flag runstate is used to control whether the current task is monitored.

Task 1 ~ Taskn calls the custom function createwatchdog (INT taskid) to create a watchdog. The monitored task requires "dog Feed" for a period of time and calls resetwatchdog (INT taskid ), in essence, this "dog Feed" Operation adds 1 to the variable curcnt In the watchdog timer structure. Taskmonitor is in the delayed state most of the time. If the timer time of the hardware watchdog is 2 seconds, the monitoring task can be delayed for 1.5 seconds. Then, the created watchdog timer group is checked one by one, before the delay, save the current value of curcnt to lastcnt. After the delay, compare whether curcnt and lastcnt are equal, and the system is normal. It should be noted that the number of data bytes of curcnt and lastcnt is too small, and the "feeding dog" is too frequent. The curcnt plus 1 operation may reach a loop, which is equal to lastcnt.

If any group of curcnt is equal to lastcnt, it is deemed that the monitored task does not have the "dog Feed" action, and the task needs to be restarted if a fault occurs, at this time, taskmonitor does not reset the hardware watchdog timer or delay for a long time, such as 10 seconds, which is enough to restart the system. Conversely, the system is normal, task1 ~ Taskn periodically "feeds the dog" to taskmonitor, and taskmonitor periodically "feeds the dog" to the hardware watchdog so that the system will not be reset. Another point is that a monitored task can call pausewatchdog (INT taskid) to cancel the corresponding watchdog. In fact, it is used to perform the runstate operation on the struct_watch_dog struct, which indicates whether the watchdog is valid or not.

The maximum number of tasks that can be monitored in this way is determined by the number of structured data of struct_watch_dog. In the program, there should be a variable to record the number of existing guard dogs and determine the monitored task task1 ~ If taskn is "dog-fed", you only need to compare the value of curcnt and lastcnt n times.

Figure 3: system reset logic diagram.

The Hardware Watchdog monitors taskmonitor tasks. The taskmonitor task also monitors other monitored tasks task1 ~ Taskn to form such a chain. In this way, the system fault diagram is shown in 3. Task 1 ~ The relationship between taskn and taskmonitor is or. Therefore, if any task under monitoring fails, the hardware circuit watchdog can be reset.

The taskmonitor task is added to implement the watchdog monitoring function of the multi-task system, which is also an important issue. Assume that the taskmonitor task has a monitoring cycle delay of 1.5 seconds. In addition, the task needs to save the current Count value and determine whether to "Feed the dog". The CPU usage is very small. A specific test proves that the VxWorks operating system is transplanted using a CPU (220 MB) with a working frequency of 50 MB. The cache does not enable monitoring of 10 tasks, and each monitoring cycle occupies ~ 240 microseconds. It can be seen that most of the time for this task is in the task delay state.

The monitored task may have statements for obtaining a message and waiting for a semaphore. The waiting period of the message and semaphore is always indefinite. This requires some modifications to such statements. For example, an indefinite semaphore acquisition operation is performed in VxWorks.

Semtake (Semid, wait_forever); // wait_forever is an infinite waiting time

Into

Do

{

Resetwatchdog; // "dog Feed" Operation

} While (semtake (Semid, sysclkrateget ())! = OK); // wait for semaphore operation within 1 s

Multiple semaphore acquisition operations within the time range, so as to ensure timely "Dog feeding ".

In addition, it should be noted that the task priority in the system is higher than that in taskmonitor and is in the execution state for a long time. taskmonitor cannot be scheduled for a long time, causing the watchdog to be reset incorrectly. A good job Division should not be configured for long-term execution of such high-priority jobs.

Author: Lin xianxian

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.