Application performance monitoring (APM) and distributed tracing are no longer optional for modern software teams. As applications grow from monoliths into collections of microservices, containerized workloads, and serverless functions, understanding how requests flow through your infrastructure becomes essential. The dominant players — Datadog, New Relic, Dynatrace, and AppDynamics — charge premium prices that scale with every host, container, and gigabyte of telemetry data you generate.
This guide covers three powerful open-source, self-hosted alternatives: SigNoz, Jaeger, and Uptrace. Each can replace commercial APM products while keeping your data on your own infrastructure, under your control, and without surprise bills when traffic spikes.
Why Self-Host Your APM and Distributed Tracing
Running your own application performance monitoring stack delivers tangible advantages that hosted platforms simply cannot match:
Cost predictability is the most immediate benefit. Commercial APM vendors price by host, container, ingestion volume, or retained data — sometimes all three simultaneously. A 50-microservice architecture processing millions of requests daily can easily generate $2,000–$10,000+ per month in APM costs. Self-hosted solutions run on your existing infrastructure with no per-host fees, no data caps, and no premium charges for long-term retention.
Data sovereignty and compliance matter for organizations in healthcare, finance, and government. Sending application traces, error logs, and performance metrics to a third-party cloud creates compliance overhead under GDPR, HIPAA, SOC 2, and industry-specific regulations. Self-hosting keeps every byte of telemetry within your security boundary.
Unlimited retention means you can keep tracing data for months or years instead of the 7–30 day windows typical of hosted APM. Long retention enables trend analysis, capacity planning, compliance audits, and post-incident forensics that would be cost-prohibitive with commercial vendors.
No vendor lock-in gives you the flexibility to modify, extend, or integrate your observability stack without being constrained by a vendor’s roadmap. Open-source APM tools implement the OpenTelemetry standard, ensuring your instrumentation code works across platforms.
What Is Distributed Tracing?
Distributed tracing tracks individual requests as they travel through a distributed system. When a user clicks “checkout” on an e-commerce site, that single action might trigger calls to an API gateway, authentication service, inventory database, payment processor, notification queue, and email service. A distributed trace captures this entire journey as a series of spans — individual units of work — organized in a trace that shows timing, dependencies, and failures.
APM goes further by combining traces with metrics (CPU, memory, request rates), logs, and application-level data (error rates, response time percentiles, throughput) to provide a unified view of system health.
The industry standard for instrumenting applications is OpenTelemetry (OTel), a CNCF project that provides vendor-neutral APIs and SDKs in virtually every programming language. All three platforms covered in this guide are fully compatible with OpenTelemetry, meaning you instrument your application once and can switch between observability backends without rewriting code.
SigNoz: Full-Stack Open-Source APM
SigNoz is the most comprehensive open-source APM platform available today. Built specifically as an open-source alternative to Datadog and New Relic, it provides unified application performance monitoring, distributed tracing, log management, and alerting in a single product. SigNoz uses ClickHouse as its storage backend, which delivers exceptional query performance on large telemetry datasets.
Key Features
- Unified APM dashboard combining metrics, traces, and logs in a single interface
- OpenTelemetry-native — instrument once, send to any backend
- Built-in service topology map showing dependencies between services
- Custom dashboards with a drag-and-drop query builder
- Automatic metric collection for RED (Rate, Errors, Duration) and USE (Utilization, Saturation, Errors) metrics
- Alerting engine with support for Slack, PagerDuty, webhooks, and email
- Exception tracking with stack traces and error grouping
- ClickHouse backend for fast queries over billions of data points
- SaaS and self-hosted deployment options
Architecture
SigNoz runs three main components:
- Query Service — handles API requests, query processing, and aggregation
- ClickHouse — columnar database for storing and querying telemetry data
- Frontend — React-based web UI for dashboards and exploration
All components communicate via gRPC and HTTP APIs. The OpenTelemetry Collector receives data from your instrumented applications and forwards it to SigNoz.
docker Compose Installation
SigNoz provides an official Docker Compose setup that deploys the entire stack:
| |
Once running, the SigNoz UI is available at http://localhost:3301.
OpenTelemetry Collector Configuration
Configure the OTel Collector to send data to SigNoz:
| |
Application Instrumentation Example (Python)
| |
For manual instrumentation with custom spans:
| |
Resource Requirements
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 4 GB | 8+ GB |
| Storage (SSD) | 20 GB | 100+ GB |
| Network | 100 Mbps | 1 Gbps |
ClickHouse performance scales with available RAM — more memory means larger caches and faster query responses. For production workloads with millions of spans per day, allocate at least 16 GB RAM and fast NVMe storage.
Strengths
- Most feature-complete open-source APM — closest direct replacement for Datadog
- Excellent ClickHouse performance for high-volume telemetry data
- Unified UI for traces, metrics, and logs reduces tool sprawl
- Active development with regular releases and a growing community
- Supports OpenTelemetry natively across all signals
Limitations
- Higher resource requirements compared to Jaeger alone
- ClickHouse storage can grow quickly under sustained high-volume ingestion
- Fewer third-party integrations compared to commercial platforms
- Alerting system is functional but less mature than Datadog monitors
Jaeger: The CNCF Distributed Tracing Standard
Jaeger is one of the oldest and most widely adopted open-source distributed tracing platforms. Originally developed by Uber and later donated to the CNCF, Jaeger graduated as a top-level CNCF project in 2019. It focuses specifically on distributed tracing — collecting, storing, and visualizing traces — without the broader APM features of SigNoz.
Key Features
- CNCF Graduated project — mature, battle-tested at enterprise scale
- Multiple storage backends — Elasticsearch, OpenSearch, Cassandra, Badger (embedded), gRPC plugin
- Jaeger UI — dedicated trace visualization with flame graphs, service dependency diagrams, and search
- Sampling strategies — probabilistic, rate-limiting, and adaptive sampling
- HotROD demo — built-in demo application for testing and learning
- Cross-service correlation — trace propagation across service boundaries
- Wide SDK support — official SDKs for Go, Java, Python, Node.js, and C++
Architecture
Jaeger consists of several components that can be deployed independently:
- Jaeger Agent — lightweight daemon that receives spans from applications via UDP
- Jaeger Collector — receives spans, validates them, and writes to storage
- Query Service — retrieves traces from storage and serves the UI
- Storage Backend — Elasticsearch, OpenSearch, Cassandra, or Badger
The modern deployment pattern uses the Jaeger All-in-One container for development and a distributed architecture for production.
Docker Compose Installation
For development and testing, Jaeger All-in-One provides everything in a single container:
| |
The Jaeger UI is available at http://localhost:16686.
Production Deployment with Elasticsearch
For production use, deploy Jaeger with Elasticsearch as the storage backend:
| |
Index Lifecycle Management
For long-running Jaeger deployments, configure Elasticsearch index lifecycle management (ILM) to control data retention:
| |
Application Instrumentation (Go)
| |
Sampling Configuration
Sampling reduces the volume of traces stored by keeping only a percentage. Configure sampling in Jaeger:
| |
This configuration samples 10% of all traces by default, captures 100% of payment-service traces, and limits checkout-api to 5 traces per second.
Resource Requirements
| Component | Minimum | Recommended |
|---|---|---|
| CPU (per collector) | 1 core | 2+ cores |
| RAM | 2 GB | 4+ GB |
| Storage | 10 GB | 50+ GB |
| Elasticsearch nodes | 1 | 3+ |
Jaeger’s resource footprint is lighter than SigNoz because it does not include metrics or log aggregation. However, Elasticsearch is resource-intensive and dominates the infrastructure requirements.
Strengths
- CNCF Graduated status with enterprise-grade maturity
- Proven at massive scale — Uber processes millions of traces per second with Jaeger
- Flexible storage backend choice (Elasticsearch, Cassandra, Badger)
- Advanced sampling strategies including adaptive sampling
- Large ecosystem of integrations and community support
- Focus on tracing means fewer moving parts than full-stack APM
Limitations
- Tracing only — no metrics aggregation, no log management, no alerting prometheus separate tools (Prometheus, Grafana, Loki) for a complete observability stack
- Elasticsearch storage can be expensive at scale
- UI is focused on trace exploration rather than operational dashboards
- No built-in service topology visualization in older versions
Uptrace: Lightweight OpenTelemetry Backend
Uptrace is a newer entrant in the open-source observability space. Written in Go and built on ClickHouse, Uptrace provides distributed tracing, metrics, and logging with a focus on simplicity and performance. Unlike SigNoz, which aims to replace Datadog feature-for-feature, Uptrace takes a lighter approach — fast ingestion, efficient storage, and a clean interface without enterprise bloat.
Uptrace offers both an open-source self-hosted version and a commercial SaaS. The self-hosted version includes the core observability features with no artificial limitations on data volume or retention.
Key Features
- OpenTelemetry-native — accepts OTLP natively via gRPC and HTTP
- ClickHouse storage — fast ingestion and queries with excellent compression
- Unified interface for traces, metrics, and logs
- Service topology with automatic dependency detection
- SQL-like query language for custom analysis
- Dashboard support with customizable widgets
- Alerting with notification channels
- Lightweight — significantly lower resource requirements than SigNoz
- Multi-tenancy support for shared deployments
Architecture
Uptrace uses a simpler architecture than SigNoz:
- Uptrace Server — single Go binary handling ingestion, querying, and the web UI
- ClickHouse — storage backend for all telemetry data
- OpenTelemetry Collector (optional) — receives data from applications and forwards to Uptrace
The single-binary design means fewer containers to manage, simpler networking, and easier upgrades.
Docker Compose Installation
| |
Create the Uptrace configuration file:
| |
Start the stack:
| |
The Uptrace UI is available at http://localhost:8080. The default credentials are admin@example.com / uptrace.
Direct Instrumentation Without OTel Collector
One advantage of Uptrace is that applications can send data directly to the Uptrace server without running a separate OTel Collector:
| |
ClickHouse Optimization for Production
For high-throughput deployments, tune ClickHouse for observability workloads:
| |
These settings increase query concurrency, limit memory per query to 12 GB, and configure MergeTree to handle high insert rates gracefully.
Resource Requirements
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 1 core | 2+ cores |
| RAM | 2 GB | 4+ GB |
| Storage (SSD) | 10 GB | 50+ GB |
| ClickHouse RAM | 2 GB | 8+ GB |
Uptrace is the lightest option of the three, requiring fewer containers and less memory overhead. The single-binary server design reduces operational complexity.
Strengths
- Lowest resource requirements — ideal for small teams and edge deployments
- Direct OTLP ingestion without requiring a separate OTel Collector
- Simple single-binary server reduces operational overhead
- Good ClickHouse compression reduces storage costs
- Clean, intuitive interface with fast query performance
- Multi-tenancy for shared team deployments
Limitations
- Smaller community and fewer third-party integrations
- Less mature than SigNoz and Jaeger — fewer enterprise features
- Alerting system is basic compared to commercial platforms
- Documentation is thinner, especially for advanced configurations
- Fewer deployment examples and community tutorials
Comparison: SigNoz vs Jaeger vs Uptrace
| Feature | SigNoz | Jaeger | Uptrace |
|---|---|---|---|
| Primary focus | Full-stack APM | Distributed tracing | Lightweight APM |
| Traces | Yes (OTLP) | Yes (OTLP, Jaeger) | Yes (OTLP) |
| Metrics | Yes | No | Yes |
| Logs | Yes | No | Yes |
| Storage | ClickHouse | ES, OpenSearch, Cassandra, Badger | ClickHouse |
| Alerting | Built-in | No | Built-in |
| Service map | Yes | Partial | Yes |
| Dashboards | Yes | No | Yes |
| Sampling | Head-based | Probabilistic, rate-limiting, adaptive | Head-based |
| OpenTelemetry | Native | Compatible | Native |
| Multi-tenancy | Limited | No | Yes |
| SaaS option | Yes | No | Yes |
| Resource usage | High | Medium | Low |
| Maturity | Growing | CNCF Graduated | Emerging |
| Best for | Teams replacing Datadog | Teams needing tracing only | Small teams, edge deployments |
When to Choose Each Platform
Choose SigNoz if you want a complete APM replacement for Datadog or New Relic. It provides traces, metrics, logs, alerting, and dashboards in a single product. The ClickHouse backend handles high-volume ingestion well, and the feature set covers most enterprise observability requirements. It is the best choice for teams that want one tool instead of assembling a custom stack from multiple components.
Choose Jaeger if your primary need is distributed tracing and you already have metrics and log solutions in place. If you run Prometheus for metrics and Grafana for dashboards, Jaeger fills the tracing gap perfectly. Its CNCF Graduated status means it is stable, well-tested, and supported by a large community. Use Jaeger when tracing is the missing piece of an existing observability puzzle.
Choose Uptrace if you have limited infrastructure resources or operate on the edge. Its single-binary design, low memory footprint, and simple deployment make it ideal for small teams, homelabs, and resource-constrained environments. When you need basic APM capabilities without the overhead of SigNoz or the complexity of running Jaeger plus complementary tools, Uptrace delivers.
Complete Deployment: Full Observability Stack
For a production-grade observability platform, combine your chosen APM with complementary tools. Here is a reference architecture using SigNoz as the central APM:
| |
This stack gives you:
- SigNoz for APM, traces, and log aggregation
- Prometheus for infrastructure and application metrics
- Grafana for custom dashboards combining data from all sources
- OpenTelemetry Collector as a universal ingestion layer
Cost Comparison: Self-Hosted vs Commercial APM
For a 20-service architecture generating approximately 50 million spans per day:
| Cost Factor | Datadog APM | New Relic | SigNoz (self-hosted) | Jaeger (self-hosted) | Uptrace (self-hosted) |
|---|---|---|---|---|---|
| Monthly cost | $4,000–$8,000 | $2,500–$5,000 | Infrastructure only | Infrastructure only | Infrastructure only |
| Per-host fee | $31–$58 | $0–$25 | None | None | None |
| Data retention | 15 days (standard) | 30 days (standard) | Unlimited | Configurable | Unlimited |
| Additional hosts | +$31–$58/mo each | +$0–$25/mo each | Free | Free | Free |
| Alerting | Included | Included | Included | Not included | Included |
| Support cost | Included | Included | Community/Enterprise | Community | Community/SaaS |
The self-hosted option typically costs $200–$600/month in infrastructure (servers, storage, networking) regardless of the number of services or data volume, compared to thousands for commercial equivalents.
Migration Tips
If you are moving from a commercial APM to a self-hosted solution:
Run both systems in parallel for at least two weeks. Instrument your applications to send data to both the commercial platform and your new self-hosted instance simultaneously. This gives you time to validate that traces are complete and dashboards are accurate.
Start with OpenTelemetry in your applications. Even if you keep your current APM temporarily, using OTel as your instrumentation layer means you can switch backends by changing configuration, not code.
Migrate dashboards incrementally. Recreate your most-used dashboards first — the ones your team checks daily. Less frequently used dashboards can be rebuilt on demand.
Test alerting thoroughly. Create test alerts that fire on known conditions and verify they reach your notification channels. Self-hosted alerting may behave differently from commercial platforms.
Plan storage capacity. Monitor ClickHouse or Elasticsearch disk growth during the parallel run. Size your storage to handle your expected retention period with a 30% buffer.
Document your new procedures. Commercial APM platforms often have runbooks, SOPs, and incident response procedures. Update these documents to reflect your self-hosted stack.
Conclusion
Self-hosted APM and distributed tracing platforms have matured significantly. SigNoz leads the pack for teams wanting a complete Datadog replacement with traces, metrics, logs, and alerting in one product. Jaeger remains the gold standard for dedicated distributed tracing, especially when combined with existing metrics and logging infrastructure. Uptrace is the lightweight alternative for teams with limited resources who still need meaningful observability.
All three implement OpenTelemetry, so the instrumentation effort is the same regardless of which backend you choose. Start instrumenting with OTel today, deploy a self-hosted backend, and take back control of your application performance data.
Frequently Asked Questions (FAQ)
Which one should I choose in 2026?
The best choice depends on your specific requirements:
- For beginners: Start with the simplest option that covers your core use case
- For production: Choose the solution with the most active community and documentation
- For teams: Look for collaboration features and user management
- For privacy: Prefer fully open-source, self-hosted options with no telemetry
Refer to the comparison table above for detailed feature breakdowns.
Can I migrate between these tools?
Most tools support data import/export. Always:
- Backup your current data
- Test the migration on a staging environment
- Check official migration guides in the documentation
Are there free versions available?
All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.
How do I get started?
- Review the comparison table to identify your requirements
- Visit the official documentation (links provided above)
- Start with a Docker Compose setup for easy testing
- Join the community forums for troubleshooting