Drive failures don’t happen without warning. S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) attributes track everything from reallocated sectors to temperature spikes — if you know how to read them. The challenge isn’t that monitoring tools don’t exist; it’s that most homelab and small-team setups rely on a single CLI tool with zero historical trends and no alerting pipeline.
This guide compares three self-hosted approaches to disk health monitoring: Scrutiny (a web dashboard with historical trend analysis), smartd (the tried-and-true SMART daemon from smartmontools), and NVMe-cli (the specialized toolset for modern NVMe drives). We’ll cover installation, Docker deployment, configuration, and help you decide which tool — or combination — fits your infrastructure.
Why Self-Host Disk Health Monitoring?
Before diving into the tools, here’s why you shouldn’t rely on manufacturer utilities or cloud-based monitoring for your drives:
- Data sovereignty: SMART data never leaves your network. You’re not sending drive health telemetry to third-party servers.
- Proactive failure detection: Historical trend analysis catches degrading drives weeks before they fail, giving you time to replace them and restore from backups.
- No subscription fees: Enterprise monitoring platforms charge per-device. Self-hosted tools are free and run on a Raspberry Pi or a low-power VPS.
- Custom alerting: Route notifications to your existing pipeline — Discord, Slack, Telegram, email, or PagerDuty — without vendor lock-in.
- NVMe-specific insights: Modern NVMe drives expose different metrics than SATA drives. Generic tools often miss critical NVMe-specific attributes like media wear percentage and thermal throttling events.
If you’re running a NAS, a homelab, or managing bare-metal servers, having visibility into drive health is as important as monitoring CPU and memory. After all, a dead drive is often the single point of failure that takes your entire service stack offline.
Tool Comparison at a Glance
| Feature | Scrutiny | smartd (smartmontools) | NVMe-cli |
|---|---|---|---|
| GitHub Stars | 7,692 | 1,097 | 1,775 |
| Last Updated | April 2026 | April 2026 | April 2026 |
| Language | Go | C++ | C |
| Web UI | Yes (built-in) | No (CLI + email) | No (CLI only) |
| Historical Trends | Yes (InfluxDB) | No | No |
| Docker Support | First-class | Manual (host access) | Manual |
| SATA Support | Yes | Yes | Limited |
| NVMe Support | Yes (via smartmontools) | Yes | Yes (native) |
| Alerting | Multi-channel (Shoutrrr) | Email only | None (manual) |
| Multi-Host | Yes (hub/spoke) | Per-host | Per-host |
| Config Complexity | Medium | Low | Low |
| Best For | Full monitoring stack | Simple daemon alerts | NVMe-specific diagnostics |
Scrutiny: Web-Based Disk Health Dashboard
Scrutiny is the most feature-complete self-hosted disk monitoring solution. It combines smartmontools’ data collection with an InfluxDB time-series backend and a polished React web UI, giving you historical graphs, failure threshold analysis, and multi-channel notifications — all from a single Docker container.
The project offers two deployment modes:
- Omnibus: Single container with web UI, collector, and InfluxDB bundled together
- Hub/Spoke: Separate web + InfluxDB on a central host, with lightweight collectors on each monitored server
Docker Compose Deployment (Omnibus)
The simplest way to get started is the omnibus image:
| |
For multi-server setups, the hub/spoke architecture runs the collector on each host:
| |
Scrutiny Configuration
The scrutiny.yaml config file controls the web server, database, and notification channels:
| |
The notification system uses Shoutrrr, which supports Discord, Telegram, Slack, Matrix, SMTP, Gotify, Ntfy.sh, Pushover, PagerDuty, Opsgenie, and more. Usernames and passwords containing special characters must be URL-encoded.
Key Features
- Historical trend graphs: See how Reallocated_Sector_Count, Temperature_Celsius, and Media_Wearout_Indicator change over weeks and months
- Vendor-specific thresholds: Scrutiny uses real-world failure data, not just manufacturer specs, to determine drive health status
- Multi-device passthrough: Map individual block devices (
/dev/sda,/dev/nvme0n1) into the container for direct SMART access - Reverse proxy support: Configure
basepathto run behind Traefik, Caddy, or Nginx at a custom URL path - REST API: Query drive status programmatically via
/api/devicesand/api/summary
smartd: The SMART Daemon
smartd is the daemon component of the smartmontools project. It’s been around since 2002, runs on virtually every Unix-like system, and provides a simple but effective approach: periodically poll SMART attributes and send email alerts when thresholds are crossed.
Installation
On Debian/Ubuntu:
| |
On RHEL/CentOS:
| |
Configuration
The configuration lives in /etc/smartd.conf. Here’s a typical setup:
| |
Run a short self-test daily and a long self-test weekly:
| |
This schedules a short test every day at 2:00 AM and a long test every Saturday at 3:00 AM.
Email Configuration
smartd sends alerts via the system’s mail command. On a minimal server, you’ll need an MTA like Postfix or use a relay:
| |
Pros and Limitations
smartd excels at simplicity: one config file, one daemon, email alerts. It uses virtually no resources and runs on anything from a Raspberry Pi to an enterprise server. However, it lacks historical data visualization, provides no web interface, and email-only alerting doesn’t integrate with modern notification platforms without additional scripting.
NVMe-cli: Native NVMe Drive Management
NVMe-cli is the official Linux command-line toolset for NVMe (Non-Volatile Memory Express) drives. While smartmontools can read NVMe SMART data through translation layers, NVMe-cli speaks the NVMe protocol natively, exposing drive-specific features that generic tools can’t access.
Installation
| |
Essential Commands
Check overall drive health:
| |
Sample output:
| |
List all NVMe devices with detailed information:
| |
Run a drive self-test:
| |
Check firmware version and update:
| |
Key NVMe Metrics to Monitor
| Metric | What It Means | Warning Threshold |
|---|---|---|
percentage_used | Drive wear level (0-100%) | > 80% |
available_spare | Remaining spare blocks (%) | < 10% |
media_errors | Uncorrectable data errors | > 0 |
critical_warning | Bitmap of critical alerts | Any non-zero value |
temperature | Current drive temperature | > 70°C |
unsafe_shutdowns | Power losses without clean shutdown | > 10 |
power_on_hours | Total operational hours | Track trend |
Choosing the Right Tool
Use Scrutiny When:
- You want a visual dashboard with historical graphs
- You manage multiple servers and need a centralized view
- You want multi-channel notifications (Discord, Telegram, Slack)
- You need vendor-specific failure thresholds based on real-world data
- You’re comfortable running Docker containers
Use smartd When:
- You want the simplest possible setup — one config file, one daemon
- You’re on a resource-constrained system (smartd uses < 5 MB RAM)
- Email alerts are sufficient for your workflow
- You need to monitor SATA, SAS, and NVMe drives on the same host
- You can’t run Docker (bare metal, minimal containers)
Use NVMe-cli When:
- You need NVMe-specific diagnostics that smartmontools can’t provide
- You want to update firmware on NVMe drives
- You’re building custom monitoring scripts with direct protocol access
- You need to run namespace management or format operations
- You want to verify security features like TCG Opal self-encryption
Recommended: Layered Approach
For a production homelab or small team, the best setup combines all three:
- Scrutiny as your primary dashboard for visual monitoring and alerting
- smartd as a lightweight backup daemon that sends email even if Scrutiny is down
- NVMe-cli in a monthly cron job for detailed NVMe diagnostics and firmware checks
| |
Integrating with Your Monitoring Stack
Scrutiny’s REST API makes it easy to integrate with existing monitoring platforms. Here are two common patterns:
Prometheus Exporter Pattern
While Scrutiny doesn’t have a native Prometheus exporter, you can use the generic JSON exporter:
| |
Grafana Dashboard
Point Grafana directly at Scrutiny’s InfluxDB instance:
| |
Then configure the InfluxDB datasource in Grafana and import the community Scrutiny dashboard templates.
Troubleshooting Common Issues
Drives Not Showing in Scrutiny Container
This usually means the device nodes aren’t passed through. Add each drive explicitly:
| |
Also ensure /run/udev is mounted read-only for device enumeration.
smartd Not Sending Email Alerts
Verify the MTA is working:
| |
Check smartd’s log:
| |
NVMe-cli Permission Denied
NVMe commands require root or the disk group:
| |
For related infrastructure monitoring, check our GPU monitoring guide with nvtop and netdata and our NAS solutions comparison covering TrueNAS and OpenMediaVault. If you’re also planning your backup strategy, our encrypted backup comparison covers the tools that protect your data once drive failure strikes.
FAQ
Does Scrutiny work with hardware RAID controllers?
Scrutiny relies on the Linux kernel’s /dev/sdX and /dev/nvmeXnY device nodes. Hardware RAID controllers that present a single logical volume (e.g., RAID 5 array) will show the virtual drive’s SMART data, not individual physical disks. For per-disk monitoring behind a hardware RAID controller, you’ll need the controller’s management tools (e.g., storcli for MegaRAID, hpssacli for HP Smart Array) alongside Scrutiny.
Can smartd monitor NVMe drives?
Yes. Modern versions of smartmontools (7.2+) support NVMe drives. Use the -d nvme device type in /etc/smartd.conf:
| |
However, NVMe-cli provides more granular NVMe-specific data that smartd cannot access, such as namespace-level metrics and firmware management.
How often does Scrutiny collect SMART data?
The Scrutiny collector runs on a cron schedule inside the container, collecting SMART data every 12 hours by default. You can adjust this by setting the COLLECTOR_CRON_SCHEDULE environment variable. The collector also runs on container startup if COLLECTOR_RUN_STARTUP is set to true.
Do I need to pass through every drive device to the Scrutiny container?
Yes. Unlike typical Docker containers, the Scrutiny collector needs direct access to block devices to read SMART data via the ioctl() system call. Each drive you want to monitor must be listed under the devices: section of your Docker Compose file. If you add a new drive later, update the compose file and restart the container.
Is Scrutiny safe to run in production?
Scrutiny runs with the SYS_RAWIO capability to access raw I/O for SMART data collection, which is a security consideration. The recommended approach is to run Scrutiny on a dedicated monitoring host or in an isolated network segment. The hub/spoke deployment model helps here: the web UI and database run on a separate host from the collectors, which only need the SYS_RAWIO capability.
What happens when a drive’s percentage_used reaches 100%?
An NVMe drive at 100% percentage_used has exhausted its rated write endurance. This doesn’t mean immediate failure — many drives continue operating beyond this point — but the manufacturer’s warranty is void and the risk of failure increases significantly. You should plan a replacement and ensure your backup strategy is solid. For comprehensive backup strategies, see our backup verification guide.