Horizontal Pod Autoscaling (HPA) is one of Kubernetes’ most powerful features, allowing your workloads to scale automatically based on demand. While CPU and memory-based scaling works well for simple use cases, real-world applications often need to scale based on application-specific metrics like queue depth, requests per second, or business KPIs.
This guide covers three approaches to custom metrics-based HPA in Kubernetes: the Prometheus Adapter, KEDA (Kubernetes Event-Driven Autoscaling), and the raw Custom Metrics API — comparing their architecture, configuration, and use cases.
Understanding Kubernetes HPA and Custom Metrics
The Kubernetes HPA controller natively supports CPU and memory metrics via the Metrics Server. However, for application-aware scaling, you need the custom metrics pipeline:
| |
The key components are:
- Metrics exporter: Your application exposes metrics (typically Prometheus format)
- Metrics store: Prometheus, VictoriaMetrics, or similar time-series database
- Custom Metrics API: Translates stored metrics into the Kubernetes metrics API
- HPA controller: Reads custom metrics and adjusts replica counts
Prometheus Adapter for HPA
The Prometheus Adapter is the official Kubernetes SIG project that exposes Prometheus metrics through the custom.metrics.k8s.io API, enabling HPA to scale based on any Prometheus metric.
Architecture
| |
Deployment with Docker Compose
| |
Adapter Configuration
| |
The Prometheus Adapter is ideal when you already run Prometheus and need fine-grained control over which metrics are exposed to HPA. It supports pod, object, and external metrics types.
Pros: Official Kubernetes SIG project, tight Prometheus integration, flexible metric queries Cons: Requires Prometheus, complex configuration for advanced scenarios, no event-driven scaling
KEDA (Kubernetes Event-Driven Autoscaling)
KEDA extends Kubernetes HPA with 50+ event source scalers, including Prometheus, Kafka, RabbitMQ, AWS SQS, and more. Unlike the Prometheus Adapter, KEDA can scale workloads to zero when no events are present.
Architecture
KEDA operates through two components:
- KEDA Operator: Manages HPA resources and connects to event sources
- Metrics Server: Exposes external metrics to the Kubernetes API
HPA with KEDA Prometheus Scaler
| |
KEDA Deployment
| |
KEDA Kafka Scaler Example
| |
KEDA’s major advantage is scale-to-zero capability and its vast ecosystem of scalers. It’s the go-to choice for event-driven architectures.
Pros: Scale-to-zero, 50+ event source scalers, simple ScaledObject CRD, active community Cons: Additional CRDs to manage, requires KEDA operator deployment, less flexible metric queries than Prometheus Adapter
Custom Metrics API (Manual Implementation)
For organizations that need maximum control or have non-standard metrics pipelines, implementing a custom metrics server directly is an option. This approach requires building a service that implements the custom.metrics.k8s.io API.
API Server Implementation
| |
HPA Using External Metrics
| |
This approach is rarely used in practice because building and maintaining a custom metrics API server is complex. Most teams should use the Prometheus Adapter or KEDA instead.
Pros: Complete control, can integrate any metrics backend Cons: Significant development effort, TLS/APIService complexity, no community support
Comparison: Prometheus Adapter vs KEDA vs Custom Metrics API
| Feature | Prometheus Adapter | KEDA | Custom Metrics API |
|---|---|---|---|
| GitHub Stars | 2,073 | 10,186 | N/A (API spec) |
| Scale to Zero | No | Yes | Depends on implementation |
| Event Sources | Prometheus only | 50+ scalers | Custom |
| CRD Required | No | Yes (ScaledObject) | Yes (APIService) |
| Setup Complexity | Medium | Low | High |
| Metric Types | Pods, Objects, External | External | Any |
| Best For | Prometheus-centric shops | Event-driven apps | Custom metrics backends |
| Docker Support | Yes | Yes | Custom |
Why Self-Host Kubernetes Autoscaling?
Running your own autoscaling infrastructure gives you complete control over scaling policies, metric collection, and data privacy. Cloud-managed autoscaling services often lock you into specific metrics pipelines or charge premium rates for custom metric ingestion.
Self-hosted autoscaling with Prometheus Adapter or KEDA means:
- No vendor lock-in: Your scaling logic works identically across on-premises, edge, and multi-cloud environments
- Cost control: No per-metric ingestion fees from cloud providers
- Data sovereignty: All metrics stay within your infrastructure, critical for regulated industries
- Custom metric pipelines: Integrate with any internal monitoring system
For teams running Kubernetes on bare metal or in hybrid environments, self-hosted autoscaling is not optional — it’s essential. The combination of Prometheus for metrics collection and KEDA for event-driven scaling provides enterprise-grade autoscaling without cloud dependencies.
For Kubernetes networking fundamentals, see our CNI comparison guide and ingress controller comparison.
Choosing the Right Autoscaling Approach
Use Prometheus Adapter when:
- You already run Prometheus for monitoring
- You need fine-grained control over metric aggregation and labeling
- Your scaling metrics are application-level (request rate, error rate, latency)
Use KEDA when:
- You need scale-to-zero for cost optimization
- Your workloads are event-driven (message queues, HTTP requests, cron jobs)
- You want pre-built scalers for popular event sources (Kafka, RabbitMQ, Redis, AWS)
Use Custom Metrics API when:
- Your metrics live in a non-Prometheus backend (InfluxDB, Datadog, custom database)
- You have unique scaling requirements not met by existing solutions
- You’re building a platform product with embedded autoscaling
FAQ
What is the difference between HPA and VPA in Kubernetes?
HPA (Horizontal Pod Autoscaler) adjusts the number of pod replicas based on metrics, while VPA (Vertical Pod Autoscaler) adjusts the CPU and memory requests/limits of individual pods. They can be used together but require careful configuration to avoid conflicts.
Can KEDA scale to zero pods?
Yes, KEDA supports scale-to-zero through the minReplicaCount: 0 setting in ScaledObject. When no events are present, KEDA scales the deployment to zero replicas, saving resources. The Prometheus Adapter does not support scale-to-zero natively.
How often does the HPA controller check metrics?
By default, the HPA controller syncs every 15 seconds. You can adjust this with the --horizontal-pod-autoscaler-sync-period flag on the kube-controller-manager. KEDA’s polling interval is configurable per ScaledObject (default 15 seconds).
Do I need Prometheus to use KEDA?
No. KEDA has 50+ scalers that connect directly to event sources (Kafka, RabbitMQ, Redis, AWS SQS, etc.) without requiring Prometheus. The Prometheus scaler is just one of many options. However, you need a metrics server (like metrics-server) installed for KEDA to function.
What happens if the Custom Metrics API server goes down?
If the Custom Metrics API server becomes unavailable, the HPA controller will fail to retrieve metrics and will stop scaling actions. It will maintain the current replica count but won’t scale up or down until the API server recovers. High availability deployments of the metrics API server are recommended for production.
Can I use multiple autoscaling approaches together?
Yes. You can run KEDA alongside the Prometheus Adapter. KEDA manages its own HPA resources and doesn’t interfere with HPAs created by the Prometheus Adapter. However, having multiple autoscalers targeting the same deployment can cause conflicts — ensure each HPA targets a different workload.