DNS resolver performance directly impacts every application and user on your network. Cache hit rates, query latency, and upstream resolution times are critical metrics that determine whether users experience instant page loads or frustrating delays. This guide covers how to monitor DNS cache statistics and resolver performance using three widely-deployed open-source DNS resolvers: Unbound, PowerDNS Recursor, and BIND 9.
Why Monitor DNS Cache Performance?
DNS resolution is the first step in nearly every network request. A slow or misconfigured DNS resolver cascades delays across your entire infrastructure:
- Cache hit rate below 80% means your resolver is performing excessive upstream queries, increasing latency and upstream DNS server load
- High query latency (above 100ms for cached queries) indicates resource constraints, disk I/O bottlenecks, or recursive resolver issues
- NXDomain spikes can indicate misconfigured applications, DNS-based attacks, or reconnaissance activity
- Cache poisoning attempts manifest as anomalous query patterns that monitoring can detect early
1. Unbound Cache Statistics
Unbound is a validating, recursive, and caching DNS resolver developed by NLnet Labs. It provides detailed cache statistics through its unbound-control command and Prometheus exporter.
Key Metrics Available
- Cache hit ratio: Percentage of queries served from cache vs. upstream resolution
- Query types: Distribution of A, AAAA, MX, TXT, and other query types
- Response codes: NOERROR, NXDOMAIN, SERVFAIL breakdown
- Memory usage: Cache size, RRset count, and memory allocation
- Upstream query time: Average latency for recursive resolution
Docker Compose Setup
| |
Unbound Configuration (unbound.conf)
| |
Collect Cache Stats
| |
2. PowerDNS Recursor Statistics
PowerDNS Recursor is a high-performance, security-focused recursive DNS resolver. It provides statistics via HTTP API, SNMP, and built-in Prometheus metrics.
Key Metrics Available
- Question rate: Queries per second
- Cache size: Number of entries in the packet cache and negative cache
- CPU usage: User and system time breakdown
- Query distribution: Per-query-type and per-qclass statistics
- Slow query tracking: Queries exceeding configurable latency thresholds
- Throttled queries: Rate-limited and blocked queries
Docker Compose Setup
| |
PowerDNS Recursor Configuration (recursor.conf)
| |
Fetch Statistics via API
| |
3. BIND 9 Statistics Channel
BIND 9 is the most widely deployed DNS server, serving as both authoritative and recursive resolver. Its statistics channel provides detailed XML/JSON metrics.
Key Metrics Available
- Resolver statistics: Queries received, resolved, cached
- Cache database size: RRset and message cache sizes
- Client query statistics: Per-view, per-client statistics
- Memory management: Heap, mmap, and context memory usage
- TSIG and DNSSEC validation: Signature validation rates
Docker Compose Setup
| |
BIND 9 Statistics Configuration (named.conf.options)
| |
Collect Statistics via rndc
| |
Comparison Table
| Feature | Unbound | PowerDNS Recursor | BIND 9 |
|---|---|---|---|
| Cache hit metrics | Via unbound-control | HTTP API + Prometheus | Statistics channel + rndc |
| Prometheus exporter | kumina/unbound_exporter | Built-in | prometheus-community/bind_exporter |
| Per-query logging | Optional (verbosity) | Lua scripting | Query logging category |
| DNSSEC validation stats | Yes | Yes | Yes |
| Cache size tuning | msg-cache-size, rrset-cache-size | max-cache-entries | max-cache-size |
| Slow query tracking | Via response time histogram | answers-slow metric | Via query logging |
| Negative cache stats | Yes (NXDOMAIN) | max-negative-ttl | Via cache DB RRsets |
| Resource usage | Low (~100MB RAM) | Low (~150MB RAM) | Medium (~300MB RAM) |
| Best for | Simple, secure caching | High-performance with rich API | Enterprise, multi-view setups |
Prometheus Dashboard Setup
| |
Key Grafana Panels
- Cache Hit Rate:
rate(unbound_response_cache_hits[5m]) / rate(unbound_response_total[5m]) * 100 - Queries per Second:
rate(unbound_queries_total[1m]) - Upstream Query Latency:
histogram_quantile(0.95, rate(unbound_response_time_seconds_bucket[5m])) - Cache Memory Usage:
unbound_memory_cache_bytes
Why Monitor DNS Cache Performance?
Proactive DNS cache monitoring prevents performance degradation before users notice it:
- Capacity planning: Track cache utilization trends to predict when you need to increase cache size or add resolver instances
- Anomaly detection: Sudden drops in cache hit rate often indicate upstream DNS issues, cache poisoning attempts, or application misconfigurations
- Cost optimization: Higher cache hit rates reduce upstream DNS queries, lowering bandwidth costs and improving response times for users
- SLA compliance: DNS resolution time is part of most application performance SLAs. Monitoring cache performance ensures you meet these commitments
- Security insights: NXDomain spikes can reveal malware C2 communication, DGA-based domain generation, or DNS tunneling attempts
- Cache tuning validation: When you adjust TTL values or cache sizes, monitoring confirms whether the changes improve hit rates or waste memory
For DNS traffic analysis, see our DNS traffic collector comparison. If you need authoritative DNS management, check our DNS authoritative server comparison. For DNS-over-TLS setup, our DNS-over-TLS guide covers encrypted resolution.
FAQ
What is a good DNS cache hit rate?
For a typical internal DNS resolver serving a stable set of domains, a cache hit rate of 80-95% is expected. Enterprise environments with diverse client applications may see 60-80%. Below 50% suggests the cache is too small, TTLs are very short, or the resolver is serving too many unique domains.
How do I monitor DNS cache performance without a dashboard?
All three resolvers provide command-line tools for quick checks: unbound-control stats, rec_control get-all, and rndc stats. For automated monitoring, each has a Prometheus exporter that feeds into time-series databases for alerting.
Why is my DNS cache hit rate dropping suddenly?
Common causes include: (1) A new application or service making queries for unique domains, (2) DNS TTL changes upstream reducing cache lifetime, (3) Cache eviction due to memory pressure, (4) A cache flush command was accidentally executed, or (5) DNS-based attacks generating random queries for non-existent domains.
How much memory should I allocate for DNS cache?
For Unbound, start with 128MB message cache and 256MB RRset cache for up to 1000 clients. Scale to 512MB/1GB for 10,000+ clients. PowerDNS Recursor uses max-cache-entries (default 1M entries ≈ 200MB). BIND 9’s max-cache-size defaults to 90% of available RAM but should be capped at 512MB-1GB for dedicated resolvers.
Can I monitor DNS cache performance across multiple resolvers?
Yes. Deploy a Prometheus instance that scrapes each resolver’s exporter, then use Grafana to aggregate and compare metrics. This is especially useful for anycast deployments where you want to compare cache hit rates across geographically distributed resolvers.
Does DNS cache monitoring help with security?
Absolutely. Monitoring cache statistics reveals security-relevant patterns: sudden spikes in NXDomain responses may indicate malware using domain generation algorithms (DGAs), unusual TXT query volumes could signal DNS tunneling, and abnormal query patterns from single sources may reveal compromised hosts.