Containers are designed to be ephemeral — when one fails, the orchestrator replaces it. But what happens when a container is running but unhealthy? Or when a new image is published and your running containers are stuck on a vulnerable old version? Or when a Kubernetes node needs to reboot for a kernel security update?
This guide compares three self-hosted tools that solve different aspects of container health monitoring and automated remediation: Watchtower (automatic Docker container image updates), Autoheal (Docker container health monitoring and restart), and Kured (Kubernetes node reboot management for security updates). Together, these tools form a comprehensive container health management stack.
For broader container monitoring strategies, see our Docker container monitoring comparison and Kubernetes node management guide.
Quick Comparison Table
| Feature | Watchtower | Autoheal | Kured |
|---|---|---|---|
| Stars | 15,500+ | 2,300+ | 2,541 |
| Platform | Docker | Docker | Kubernetes |
| Purpose | Auto-update containers | Auto-restart unhealthy containers | Auto-reboot nodes for updates |
| Trigger | New image in registry | Health check failure | Reboot required file |
| Update Strategy | Pull & replace container | Restart container | Cordon & drain, then reboot |
| Rollback | No | No (restart only) | Automatic (via node reboot) |
| Scheduling | Cron-based or polling | Continuous monitoring | Continuous monitoring |
| Notification | Slack, email, webhook, Gotify | Log-based | Kubernetes events |
| Graceful Shutdown | Yes (SIGTERM then SIGKILL) | Yes | Yes (cordon + drain) |
| Docker Compose | Single container | Single container | Helm chart / K8s manifest |
| Image Filtering | Labels, include/exclude | Labels, health status | Not applicable |
| License | Apache 2.0 | MIT | Apache 2.0 |
Watchtower: Automatic Docker Image Updates
Watchtower monitors your Docker containers for new image versions and automatically pulls and restarts them. It is the simplest way to keep your self-hosted services up to date without manual intervention.
Docker Compose Setup
| |
Key Watchtower Configuration Options
| Environment Variable | Purpose | Default |
|---|---|---|
WATCHTOWER_POLL_INTERVAL | Seconds between checks | 86400 (24h) |
WATCHTOWER_CLEANUP | Remove old images after update | false |
WATCHTOWER_LABEL_ENABLE | Only update containers with watchtower label | false |
WATCHTOWER_INCLUDE_RESTARTING | Include restarting containers in checks | false |
WATCHTOWER_NOTIFICATIONS | Notification channel | empty |
WATCHTOWER_SCHEDULE | Cron schedule for checks | empty (polling) |
Selective Auto-Update with Labels
The safest approach is to enable label-based filtering so Watchtower only updates containers you explicitly mark:
| |
This prevents Watchtower from accidentally updating critical services like databases that require manual migration steps.
Autoheal: Container Health Check Remediation
Autoheal monitors Docker containers for health check failures and automatically restarts unhealthy containers. While Watchtower updates containers when new images are available, Autoheal acts when containers are running but unhealthy — a fundamentally different problem.
Docker Compose Setup
| |
How Autoheal Works
Autoheal queries the Docker API for all containers that have health checks defined. It monitors the health status and takes action when a container transitions to unhealthy:
| |
When web-app fails its health check 3 consecutive times, Autoheal restarts it. The AUTOHEAL_DEFAULT_STOP_TIMEOUT gives the container time to gracefully shut down before a hard kill.
When to Use Autoheal vs Watchtower
| Scenario | Watchtower | Autoheal |
|---|---|---|
| New image published upstream | Updates container | No action |
| Application crashes / hangs | No action | Restarts container |
| Health endpoint returns 500 | No action | Restarts container |
| Container exits with code 1 | No action | Docker restarts (not Autoheal) |
| Image tag is updated | Updates container | No action |
Use both tools together: Watchtower keeps your images current, Autoheal keeps your running containers healthy.
Kured: Kubernetes Node Reboot Orchestration
Kured (Kubernetes Reboot Daemon) solves a different problem entirely — rebooting Kubernetes nodes safely when security updates require it. Unlike Watchtower and Autoheal which operate at the container level, Kured operates at the node (host) level.
Deployment via Helm
| |
| |
How Kured Works
Kured runs as a DaemonSet on every Kubernetes node. Its workflow is:
- Check for reboot requirement: Runs the configured sentinel command (e.g., checking if
/var/run/reboot-requiredexists on Debian/Ubuntu, or ifneeds-restarting --reboothintreturns true on RHEL/CentOS). - Acquire a lock: Only one node reboots at a time to maintain cluster capacity.
- Cordon the node: Mark the node unschedulable so no new pods are placed on it.
- Drain the node: Evict all pods gracefully, respecting PodDisruptionBudgets.
- Reboot the node: Execute the reboot.
- Wait for node to return: Monitor until the node is Ready again.
- Release the lock and move to the next node.
Kured Configuration Options
| |
Combined Container Health Stack
For a comprehensive container health management approach across both Docker and Kubernetes environments:
| Layer | Tool | Responsibility |
|---|---|---|
| Docker image updates | Watchtower | Auto-pull new images, restart containers |
| Docker health remediation | Autoheal | Restart unhealthy containers |
| Kubernetes node maintenance | Kured | Safe node reboots for security patches |
| Container monitoring | cadvisor/Dozzle | Visibility into container resource usage |
| Kubernetes pod health | Native probes | Liveness/readiness probes + restart policy |
This stack ensures that containers stay updated, unhealthy containers get restarted, and nodes receive security patches without manual intervention.
Why Self-Host Container Health Tools?
Self-hosting container health and auto-healing tools gives you complete control over update policies, maintenance windows, and notification routing. Cloud-based container management platforms often enforce their own update schedules and notification channels. With self-hosted tools, you decide when containers update, which services get auto-healed, and how your team gets alerted.
For organizations with compliance requirements, self-hosted tools ensure that update and reboot logs stay within your infrastructure. You can integrate with internal ticketing systems, custom Slack channels, or on-call rotation tools like Grafana OnCall — integrations that cloud platforms may not support.
For related infrastructure automation, see our Kubernetes automated update and restart guide which covers kured, Reloader, and Keel for Kubernetes automation patterns.
FAQ
Q: Is Watchtower safe for production use?
Watchtower is safe when used with label-based filtering and proper testing. The key risk is updating a container to a new image that has breaking changes or bugs. Mitigation strategies: (1) Use WATCHTOWER_LABEL_ENABLE=true so only explicitly labeled containers are updated. (2) Pin database containers to specific versions without auto-update labels. (3) Set up notifications so you know when updates happen. (4) Test new image versions in a staging environment before they reach production registries.
Q: Can Autoheal cause restart loops?
Yes, if a container has a persistent bug that causes health check failures, Autoheal will restart it indefinitely. To prevent this: (1) Set a reasonable health check start_period to give the container time to initialize. (2) Monitor restart counts — Autoheal logs each restart. (3) Use Docker’s built-in restart: unless-stopped policy with a maximum retry count. (4) Alert on high restart frequency so you can investigate the root cause rather than relying on Autoheal as a band-aid.
Q: Does Kured work with cloud-managed Kubernetes (EKS, GKE, AKS)?
Kured works best with self-managed Kubernetes clusters where you control the node OS. For cloud-managed Kubernetes: EKS Managed Node Groups handle OS patching automatically. GKE Auto-Upgrade handles node reboots. AKS Automatic Cluster Node Maintenance handles patching. If you use self-managed node groups or self-managed Kubernetes on cloud VMs, Kured is still useful for coordinating safe reboots across your node pool.
Q: How do I prevent Watchtower from updating containers during business hours?
Use the WATCHTOWER_SCHEDULE environment variable with a cron expression:
| |
This runs Watchtower checks only at 2:00 AM. You can also combine this with WATCHTOWER_NOTIFICATIONS to receive a summary of what was updated each morning.
Q: What happens if Kured fails to drain a node?
Kured respects PodDisruptionBudgets (PDBs). If a PDB prevents pod eviction (e.g., you only have 1 replica of a critical service), Kured will wait up to drainTimeout (default 2 hours) before giving up. After timeout, Kured logs a warning and skips that node until the next check cycle. This is intentional — Kured prioritizes application availability over timely security patching. You should configure blockingPodSelector to explicitly block reboots when critical workloads are running.
Q: Should I use Watchtower or Autoheal — or both?
They solve different problems and work best together. Watchtower handles the “new image available” scenario — keeping your containers updated with the latest patches and features. Autoheal handles the “container is running but unhealthy” scenario — restarting containers that have crashed, hung, or are returning errors. Without Watchtower, your containers run old (potentially vulnerable) images. Without Autoheal, your unhealthy containers stay running until manually restarted.