System watchdog timers are a critical but often overlooked component of reliable self-hosted infrastructure. A watchdog timer is a hardware or software mechanism that automatically reboots the system if it detects a hang, kernel panic, or unresponsive service. For self-hosted servers running unattended — home labs, edge devices, remote servers — watchdog management is the difference between a quick automatic recovery and days of downtime.

This guide compares the three primary watchdog management approaches on Linux: the traditional watchdog daemon, systemd’s built-in watchdog, and IPMI hardware watchdogs.

Understanding Linux Watchdog Timers

A watchdog timer works on a simple principle: a counter decrements toward zero, and a “pet” (keep-alive signal) must be sent periodically to reset the counter. If the counter reaches zero — meaning the system failed to send a pet — the watchdog triggers a system reset.

Linux supports watchdogs at three levels:

  • Hardware watchdog — a physical timer chip on the motherboard (IPMI, TPM, or dedicated watchdog IC). Most reliable, survives kernel panics.
  • Software watchdog — a kernel module (softdog) that implements watchdog behavior in software. Less reliable than hardware but available on all systems.
  • Userspace daemon — a process (like the watchdog daemon) that manages the watchdog device and monitors system health.

Comparison Table: Watchdog Management Approaches

Featurewatchdog Daemonsystemd WatchdogIPMI Watchdog
TypeUserspace daemonsystemd built-inHardware (BMC)
Kernel ModuleAny (/dev/watchdog)softdog or hwipmi_watchdog
Panics RecoveryNo (daemon dies)Yes (kernel-level)Yes (hardware-level)
Service MonitoringYes (custom tests)Yes (WatchdogSec)No (heartbeat only)
Configuration/etc/watchdog.confsystemd service unitipmitool commands
Network Failure DetectionYes (ping test)No (manual setup)No
Disk I/O MonitoringYes (file change test)NoNo
Temperature MonitoringYes (thermal test)NoYes (via IPMI sensors)
Docker CompatibleYes (sidecar container)Yes (host-level)Yes (via ipmitool)
Project StarsSourceForge (legacy)systemd (12,000+)ipmitool (500+)

watchdog Daemon: Traditional Userspace Monitoring

The watchdog daemon is the classic Linux watchdog management tool. It monitors system health through configurable tests (ping, file changes, temperature, load average) and manages the watchdog device.

Installation

1
2
3
4
5
6
7
8
# Debian/Ubuntu
sudo apt install watchdog

# Arch Linux
sudo pacman -S watchdog

# Fedora
sudo dnf install watchdog

Configuration

The watchdog daemon is configured via /etc/watchdog.conf:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Watchdog device
watchdog-device = /dev/watchdog

# Interval between pats (seconds)
watchdog-timeout = 15

# System tests
ping = 192.168.1.1
ping = 8.8.8.8
file = /var/log/syslog
change = 300

# Temperature monitoring (if sensor module loaded)
temperature-device = /sys/class/thermal/thermal_zone0/temp
max-temperature = 80

# Load average
max-load-1 = 24
max-load-5 = 20
max-load-15 = 16

# Memory usage
min-memory = 1
admin-email = admin@example.com

# Repair action before reboot
repair = /sbin/watchdog-repair

Enabling the Daemon

1
2
3
4
5
6
7
8
9
# Load the watchdog kernel module
sudo modprobe softdog

# Enable and start the daemon
sudo systemctl enable watchdog
sudo systemctl start watchdog

# Verify it is running
sudo systemctl status watchdog

The daemon writes to /dev/watchdog at the configured interval. If any test fails, it attempts the repair command first, then lets the watchdog timer expire to trigger a reboot.

systemd Watchdog: Built-In Reliability

systemd has built-in watchdog support since version 219. Unlike the watchdog daemon, systemd’s watchdog operates at the service-manager level — it can monitor individual services, not just the entire system.

System-Level Watchdog

Configure the hardware or software watchdog in systemd:

1
2
3
4
5
# /etc/systemd/system.conf
[Manager]
RuntimeWatchdogSec=30s
RebootWatchdogSec=10min
ShutdownWatchdogSec=10min

This tells systemd to pat the watchdog every 30 seconds. If systemd itself becomes unresponsive, the hardware watchdog triggers a reboot.

Per-Service Watchdog

Monitor individual services for hangs:

1
2
3
4
5
6
7
8
9
# /etc/systemd/system/my-service.service
[Service]
WatchdogSec=30s
Restart=on-failure
RestartSec=5s

# In your application code, call sd_notify() with "WATCHDOG=1"
# to signal that the service is still alive:
# sd_notify(0, "WATCHDOG=1");

If the service does not send a watchdog ping within 30 seconds, systemd restarts it automatically.

systemd and the softdog Kernel Module

When no hardware watchdog is available, systemd can use the softdog kernel module:

1
2
3
4
5
6
7
8
# Load softdog
sudo modprobe softdog

# Configure the timeout
echo 30 | sudo tee /sys/module/softdog/parameters/soft_margin

# Verify
cat /sys/module/softdog/parameters/soft_margin

Docker Integration

systemd’s watchdog operates at the host level, so Docker containers benefit automatically. You can also run a watchdog monitor inside a container:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
version: "3.8"
services:
  watchdog-monitor:
    image: alpine:latest
    container_name: system-watchdog
    volumes:
      - /dev/watchdog:/dev/watchdog
      - /var/run/systemd:/var/run/systemd
    security_opt:
      - apparmor:unconfined
    command: >
      sh -c "
      while true; do
        # Test critical service connectivity
        if ! curl -sf http://localhost:8080/health > /dev/null 2>&1; then
          echo 'Health check failed - triggering watchdog'
          echo 1 > /dev/watchdog
          sleep 5
        fi
        sleep 10
      done
      "
    restart: unless-stopped

IPMI Watchdog: Hardware-Level Reliability

IPMI (Intelligent Platform Management Interface) provides hardware watchdog functionality through the Baseboard Management Controller (BMC). This is the most reliable option because the watchdog operates independently of the OS — it survives kernel panics, driver failures, and even OS crashes.

IPMI Tools Installation

1
2
3
4
5
6
7
8
# Debian/Ubuntu
sudo apt install ipmitool

# Arch Linux
sudo pacman -S ipmitool

# Fedora
sudo dnf install ipmitool

Configuring the IPMI Watchdog

1
2
3
4
5
6
7
8
# Check current watchdog status
sudo ipmitool mc watchdog get

# Set the watchdog timer (in milliseconds)
sudo ipmitool mc watchdog set timer-use=SMS/OS   action=hard-reset   pre-timeout-interval=0   timer-use-expiration-flags=do-not-log   initial-countdown=60000  # 60 seconds

# Reset (pet) the watchdog
sudo ipmitool mc watchdog reset

Automated IPMI Watchdog Petting

Create a systemd service to periodically pet the IPMI watchdog:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# /etc/systemd/system/ipmi-watchdog.service
[Unit]
Description=IPMI Watchdog Pet Service
After=network.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'while true; do ipmitool mc watchdog reset; sleep 30; done'
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Docker Integration with IPMI

Run IPMI watchdog monitoring from a container:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
version: "3.8"
services:
  ipmi-watchdog:
    image: linuxserver/ipmitool
    container_name: ipmi-watchdog
    devices:
      - /dev/ipmi0:/dev/ipmi0
    environment:
      - IPMI_HOST=localhost
    command: >
      sh -c "
      while true; do
        ipmitool mc watchdog reset
        sleep 30
      done
      "
    restart: unless-stopped

Why Self-Host with Watchdog Management?

For self-hosted servers running critical services — databases, email servers, file storage, home automation — uptime is paramount. A kernel panic, memory leak, or network freeze should trigger automatic recovery, not require a manual power cycle. Watchdog timers provide this safety net.

Hardware watchdogs (IPMI) are essential for remote or edge deployments where physical access is limited. For home labs and development servers, systemd’s built-in watchdog provides good coverage with zero additional dependencies. The traditional watchdog daemon excels when you need application-level health checks (ping tests, file monitoring, temperature thresholds) before deciding whether to trigger a reboot.

For related reading on system reliability, see our Linux service restart detection guide and systemd journal collection guide. For kernel-level security hardening, check our kernel security auditing guide.

FAQ

Do I need a hardware watchdog, or is the software watchdog enough?

A hardware watchdog (IPMI or dedicated watchdog IC) is more reliable because it operates independently of the OS. It survives kernel panics, driver crashes, and complete OS freezes. A software watchdog (softdog) runs inside the kernel — if the kernel itself hangs, the softdog cannot trigger a reboot. For production servers, hardware watchdog is strongly recommended. For development and testing, softdog is sufficient.

Can I use multiple watchdog methods simultaneously?

Yes, but you should only have ONE active watchdog device. You can use systemd’s watchdog with the hardware watchdog device, and also run IPMI watchdog petting alongside it. However, multiple watchdogs petting the same /dev/watchdog device can cause conflicts. The typical setup is: hardware watchdog + systemd managing it, with IPMI as a backup.

What happens when the watchdog timer expires?

The action depends on the watchdog configuration. Common actions include: hard reset (immediate system reboot), power cycle (full power off and on), power off (shutdown without reboot), or no action (just log the event). For self-hosted servers, hard reset is the recommended action — it recovers the system quickly.

How do I test if my watchdog is working?

For hardware watchdog: sudo ipmitool mc watchdog get shows the current timer state. For systemd: check systemctl status for watchdog-related log entries. For the watchdog daemon: check /var/log/syslog for watchdog activity. To force a test, stop the watchdog daemon or the petting service and observe whether the system reboots after the timeout period.

Does the watchdog daemon work in Docker containers?

Yes, but the container needs access to /dev/watchdog. Mount it as a device and run the watchdog daemon or a custom health-check script inside the container. Note that triggering the watchdog from inside a container will reboot the HOST system, not just the container — use this carefully.

What is the difference between watchdog and health checks?

A watchdog timer is a low-level mechanism that reboots the entire system if no keep-alive signal is received. A health check is an application-level test (HTTP endpoint, database connection, file freshness) that determines if a specific service is healthy. The watchdog daemon can run health checks and only pet the watchdog if all checks pass — combining both approaches for smarter recovery decisions.