What is a Watchdog Timer?
A watchdog timer is a device that triggers a system reset if it detects that the system has hung. A program running on the system is supposed periodically to service the watchdog timer by writing a "service pulse." If the watchdog is not serviced within a particular period of time, the watchdog assumes that the system has hung, and triggers a system reset.
What is Softdog?
Usually, watchdog timers are implemented as add-on cards, or as on-chip peripherals within microcontrollers. But if there is no hardware watchdog, the Linux kernel can provide a software watchdog implemented using kernel timers.
Linux Watchdog Mechanism
In Linux, the watchdog driver provides a character driver interface to the user space. When some data is written to the watchdog driver, the watchdog driver services the watchdog hardware. The user space application periodically writes some data to the watchdog driver, depending upon the watchdog timeout period. If for some reason the user space application hangs, the watchdog device does not get serviced and hence triggers a system reset.
Usually the application that writes to the watchdog driver is a watchdog daemon which monitors processes in the system, as well as other parameters such as CPU utilization, memory utilization, and so on.
How Softdog Works
When the softdog driver is opened, softdog schedules a kernel timer to expire after a specified timer margin. When some data is written to the driver, the softdog driver re-schedules the timer. The user space watchdog daemon periodically writes to the driver, and the timer is continuously rescheduled and hence the timer callback is never called. If the watchdog daemon stops writing to the driver, the timer expires and the callback is called. In the timer callback, the system is restarted. Here's a link to an article on ways to use a watchdog timer effectively.
Source files
In the 2.6 kernel the softdog driver is located in
drivers/char/watchdog/softdog.c
. Here's a link to the softdog driver on LXR
Initialization
static int __init watchdog_init(void)
{
int ret;
/* Check that the soft_margin value is within it's range ;
if not reset to the default */
if (softdog_set_heartbeat(soft_margin)) {
softdog_set_heartbeat(TIMER_MARGIN);
printk(KERN_INFO PFX "soft_margin value must
be 0<soft_margin<65536, using %d\n",
TIMER_MARGIN);
}
ret = register_reboot_notifier(&softdog_notifier);
if (ret) {
printk (KERN_ERR PFX "cannot register
reboot notifier (err=%d)\n",
ret);
return ret;
}
ret = misc_register(&softdog_miscdev);
if (ret) {
printk (KERN_ERR PFX "cannot register miscdev
on minor=%d (err=%d)\n",
WATCHDOG_MINOR, ret);
unregister_reboot_notifier(&softdog_notifier);
return ret;
}
printk(banner, soft_noboot, soft_margin, nowayout);
return 0;
}