When a data pipeline silently starts producing stale records, or a upstream schema change breaks downstream dashboards, the cost compounds with every minute the problem goes undetected. Data observability — the practice of monitoring data health across freshness, volume, schema, distribution, and lineage — has become essential for any team running self-hosted analytics infrastructure.
This guide compares three open-source data observability tools you can deploy on your own infrastructure: Elementary, Soda Core, and Great Expectations. Each takes a different approach to the same goal: knowing when your data is wrong before your users do.
If you are already running self-hosted data orchestration with Airflow, Prefect, or Dagster, adding an observability layer on top gives you early warning when orchestrated jobs produce bad output. For teams also managing data quality tests with Great Expectations, Soda, or dbt, the observability tools in this guide complement (rather than replace) those testing frameworks. And if you run a self-hosted data catalog like Amundsen, DataHub, or OpenMetadata, observability alerts feed directly into your catalog’s health dashboards.
Why Data Observability Matters for Self-Hosted Infrastructure
Self-hosted data stacks lack the managed monitoring that cloud platforms provide. When you own every layer — databases, orchestration, transformation, and BI — you also own the responsibility of detecting when something goes wrong. Common failure modes include:
- Freshness failures — a scheduled ETL job silently stops running or takes hours longer than expected
- Schema drift — an upstream API changes column names or types without notice
- Volume anomalies — a source system suddenly sends 10x or 0.01x the normal row count
- Distribution shifts — the statistical profile of a column changes (e.g., average order value drops 80%)
- Lineage breaks — a transformation pipeline dependency changes, invalidating downstream models
Without observability, these issues surface only when a dashboard shows wrong numbers or a downstream analyst files a ticket. Observability tools catch them at the data layer, automatically.
Tool Overview
| Feature | Elementary | Soda Core | Great Expectations |
|---|---|---|---|
| GitHub Stars | 2,313 | 2,338 | 11,436 |
| Last Updated | Apr 2026 | Apr 2026 | Apr 2026 |
| Primary Language | Python (HTML frontend) | Python | Python |
| License | Apache-2.0 | Proprietary (core is Apache-2.0) | Apache-2.0 |
| dbt Integration | Native (first-class) | Via soda-dbt plugin | Via ge-great-expectations CLI |
| Self-Hosted UI | Yes (built-in web app) | CLI only (no native UI) | Data Docs (static HTML) |
| Anomaly Detection | Built-in statistical tests | Via checks on historical profiles | Via Expectation Suites |
| Schema Monitoring | Automatic detection | Manual schema checks | Manual schema expectations |
| Alerting | Slack, email, webhook | CLI output, CI/CD integration | Email, Slack, custom callbacks |
| Data Lineage | dbt-native lineage graph | No built-in lineage | Limited via Data Docs |
| Database Support | PostgreSQL, BigQuery, Snowflake, Redshift, DuckDB | PostgreSQL, Snowflake, BigQuery, Spark, DuckDB | 20+ databases via Dialects |
Elementary: dbt-Native Data Observability
Elementary is designed specifically for teams already using dbt. It monitors dbt models for freshness, volume, schema changes, and test failures, then presents results in a self-hosted web dashboard.
Architecture
Elementary works in two modes:
- CLI mode — runs as a dbt post-hook, collecting metadata after each dbt run
- Full observability — deploys a separate monitoring service that polls your data warehouse on a schedule
The self-hosted version includes a PostgreSQL-backed web UI where you can see model health, test results, and anomaly alerts.
Installation via Docker Compose
| |
Start the service:
| |
Configuration in dbt
Add Elementary as a dbt package in your packages.yml:
| |
Run dbt deps to install, then configure monitoring in your dbt_project.yml:
| |
Run your normal dbt run — Elementary automatically collects metadata via post-hooks. Then generate a report:
| |
Strengths
- Zero-configuration monitoring — if you use dbt, Elementary detects anomalies without writing any test definitions
- Built-in web UI — the self-hosted dashboard shows model health, test results, and lineage in one place
- dbt lineage integration — automatically maps which upstream models are causing downstream failures
Soda Core: Data Contracts and Reliability
Soda Core is a data reliability engine that lets you define “data contracts” — explicit expectations about what your data should look like — and check them against actual data. It integrates with CI/CD pipelines and orchestration tools.
Installation
Soda Core is a Python package:
| |
Or use it in a Docker container:
| |
Configuration
Create a configuration.yml for your data source:
| |
Define checks in checks.yml:
| |
Run a scan:
| |
Soda outputs results to stdout (for CI/CD) or can push to Soda Cloud for dashboards.
Strengths
- Declarative check definitions — YAML-based checks are readable and version-controllable
- CI/CD integration — scans fail pipelines when data contracts are violated
- Wide database support — works with PostgreSQL, Snowflake, BigQuery, Spark, and DuckDB out of the box
Great Expectations: Statistical Data Profiling
Great Expectations is the most mature open-source data validation framework. While often used for data quality testing, its profiling and monitoring capabilities make it a powerful observability tool.
Installation
| |
Initialize a project:
| |
Configuration
Create an Expectation Suite that defines what “good” data looks like:
| |
Define expectations programmatically:
| |
Run validation and build Data Docs:
| |
Data Docs generates a static HTML site you can host on any web server or S3 bucket, showing pass/fail status for every expectation.
Strengths
- Most comprehensive expectation library — 100+ built-in expectations covering statistical, structural, and distributional checks
- Data Docs — beautiful, self-hostable HTML reports that non-technical stakeholders can read
- Large ecosystem — integrations with Airflow, dbt, Spark, Pandas, and most major orchestration tools
Head-to-Head: Which Tool Should You Choose?
| Criterion | Elementary | Soda Core | Great Expectations |
|---|---|---|---|
| Best For | dbt-centric teams | CI/CD-integrated data contracts | Comprehensive validation |
| Setup Time | Minutes (if using dbt) | Minutes | Hours (steep learning curve) |
| Anomaly Detection | Automatic (no config) | Manual threshold definition | Manual expectation definition |
| UI/Dashboard | Built-in web app | None (CLI only) | Static HTML (Data Docs) |
| Alerting | Slack, email, webhook | CI/CD exit codes | Custom callbacks |
| Learning Curve | Low | Low-Medium | High |
| Active Development | Very active | Active | Very active (largest community) |
Decision Matrix
Choose Elementary if:
- You already use dbt as your transformation layer
- You want automatic anomaly detection without writing test definitions
- You need a built-in web dashboard for monitoring
Choose Soda Core if:
- You want to enforce data contracts in CI/CD pipelines
- You prefer declarative YAML check definitions
- You need fast setup with minimal infrastructure
Choose Great Expectations if:
- You need the broadest range of data validation capabilities
- You want beautiful, stakeholder-facing Data Docs
- You have complex validation requirements beyond freshness and schema checks
Deployment Patterns for Self-Hosted Stacks
A typical self-hosted observability deployment combines these tools with existing infrastructure:
| |
Scheduled Monitoring with Cron
For tools without built-in schedulers, use cron to run periodic checks:
| |
Alerting via ntfy
Pipe scan results to a self-hosted notification service:
| |
For more on self-hosted push notifications, see our Gotify vs ntfy guide.
FAQ
What is data observability and how does it differ from data quality testing?
Data observability is the practice of continuously monitoring data pipelines for health signals like freshness, volume, schema changes, and distribution anomalies. Data quality testing validates that data meets specific business rules at a point in time. Observability is proactive and continuous; testing is reactive and scheduled. Think of observability as your dashboard warning lights, and testing as your pre-flight checklist.
Can I run these tools without dbt?
Elementary requires dbt for its core monitoring features (it uses dbt post-hooks to collect metadata). Soda Core and Great Expectations work independently of dbt — they connect directly to your data warehouse or lake and run checks on any table or query.
Do these tools replace data quality testing?
No. Observability tools complement testing. Elementary and Soda Core focus on detecting when something has changed or gone wrong (anomalies, freshness failures). Great Expectations does both — it can run anomaly detection AND enforce business rule validations. For comprehensive coverage, use observability for automated monitoring and testing frameworks for business-rule validation.
Which tool is easiest to set up for a small team?
Soda Core has the lowest barrier to entry: install the pip package, write a configuration.yml and checks.yml, and run soda scan. You can go from zero to monitoring in under 30 minutes. Elementary is equally fast if you already have dbt running — just add the package and configure monitors. Great Expectations requires more setup (project initialization, suite creation, checkpoint configuration) but offers the most flexibility once configured.
Can these tools monitor streaming data?
Great Expectations supports streaming validation via Spark Structured Streaming and Pandas on micro-batches. Soda Core can run on Spark DataFrames for near-real-time checks. Elementary is primarily designed for batch data via dbt runs and does not natively support streaming pipelines.
How do I host the monitoring dashboards?
Elementary includes a built-in web application (deployed via Docker Compose) that serves its dashboard on port 3000. Great Expectations generates static HTML Data Docs that you can host on any web server, S3 bucket, or internal wiki. Soda Core does not include a UI — results are output to the console or integrated into CI/CD pipeline logs.
Are these tools production-ready for large data warehouses?
Yes. All three tools are used in production at companies processing billions of rows. Great Expectations has the longest track record and largest community. Elementary is optimized for dbt workflows and scales with your dbt project. Soda Core is designed for CI/CD integration and can run checks on tables of any size (though scan time increases with data volume).