← Back to posts
comparison guide self-hosted · · 10 min read

Elementary vs Soda vs Great Expectations: Self-Hosted Data Observability Guide 2026

Compare Elementary, Soda Core, and Great Expectations for self-hosted data observability. Learn which open-source tool best detects pipeline failures, schema changes, and data anomalies with hands-on configuration guides.

OS
Editorial Team

When a data pipeline silently starts producing stale records, or a upstream schema change breaks downstream dashboards, the cost compounds with every minute the problem goes undetected. Data observability — the practice of monitoring data health across freshness, volume, schema, distribution, and lineage — has become essential for any team running self-hosted analytics infrastructure.

This guide compares three open-source data observability tools you can deploy on your own infrastructure: Elementary, Soda Core, and Great Expectations. Each takes a different approach to the same goal: knowing when your data is wrong before your users do.

If you are already running self-hosted data orchestration with Airflow, Prefect, or Dagster, adding an observability layer on top gives you early warning when orchestrated jobs produce bad output. For teams also managing data quality tests with Great Expectations, Soda, or dbt, the observability tools in this guide complement (rather than replace) those testing frameworks. And if you run a self-hosted data catalog like Amundsen, DataHub, or OpenMetadata, observability alerts feed directly into your catalog’s health dashboards.

Why Data Observability Matters for Self-Hosted Infrastructure

Self-hosted data stacks lack the managed monitoring that cloud platforms provide. When you own every layer — databases, orchestration, transformation, and BI — you also own the responsibility of detecting when something goes wrong. Common failure modes include:

  • Freshness failures — a scheduled ETL job silently stops running or takes hours longer than expected
  • Schema drift — an upstream API changes column names or types without notice
  • Volume anomalies — a source system suddenly sends 10x or 0.01x the normal row count
  • Distribution shifts — the statistical profile of a column changes (e.g., average order value drops 80%)
  • Lineage breaks — a transformation pipeline dependency changes, invalidating downstream models

Without observability, these issues surface only when a dashboard shows wrong numbers or a downstream analyst files a ticket. Observability tools catch them at the data layer, automatically.

Tool Overview

FeatureElementarySoda CoreGreat Expectations
GitHub Stars2,3132,33811,436
Last UpdatedApr 2026Apr 2026Apr 2026
Primary LanguagePython (HTML frontend)PythonPython
LicenseApache-2.0Proprietary (core is Apache-2.0)Apache-2.0
dbt IntegrationNative (first-class)Via soda-dbt pluginVia ge-great-expectations CLI
Self-Hosted UIYes (built-in web app)CLI only (no native UI)Data Docs (static HTML)
Anomaly DetectionBuilt-in statistical testsVia checks on historical profilesVia Expectation Suites
Schema MonitoringAutomatic detectionManual schema checksManual schema expectations
AlertingSlack, email, webhookCLI output, CI/CD integrationEmail, Slack, custom callbacks
Data Lineagedbt-native lineage graphNo built-in lineageLimited via Data Docs
Database SupportPostgreSQL, BigQuery, Snowflake, Redshift, DuckDBPostgreSQL, Snowflake, BigQuery, Spark, DuckDB20+ databases via Dialects

Elementary: dbt-Native Data Observability

Elementary is designed specifically for teams already using dbt. It monitors dbt models for freshness, volume, schema changes, and test failures, then presents results in a self-hosted web dashboard.

Architecture

Elementary works in two modes:

  1. CLI mode — runs as a dbt post-hook, collecting metadata after each dbt run
  2. Full observability — deploys a separate monitoring service that polls your data warehouse on a schedule

The self-hosted version includes a PostgreSQL-backed web UI where you can see model health, test results, and anomaly alerts.

Installation via Docker Compose

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
version: "3.8"

services:
  elementary-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: elementary
      POSTGRES_USER: elementary
      POSTGRES_PASSWORD: elementary_secret
    volumes:
      - elementary_data:/var/lib/postgresql/data
    ports:
      - "5433:5432"

  elementary-app:
    image: elementarydata/elementary:latest
    environment:
      ELEMENTARY_DATABASE_TYPE: postgres
      ELEMENTARY_DATABASE_HOST: elementary-db
      ELEMENTARY_DATABASE_PORT: 5432
      ELEMENTARY_DATABASE_NAME: elementary
      ELEMENTARY_DATABASE_USER: elementary
      ELEMENTARY_DATABASE_PASSWORD: elementary_secret
    ports:
      - "3000:3000"
    depends_on:
      - elementary-db

volumes:
  elementary_data:

Start the service:

1
docker compose up -d

Configuration in dbt

Add Elementary as a dbt package in your packages.yml:

1
2
3
packages:
  - package: elementary-data/elementary
    version: "0.15.0"

Run dbt deps to install, then configure monitoring in your dbt_project.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
elementary:
  +enabled: true
  monitors:
    freshness:
      enabled: true
      anomaly_sensitivity: 2
    schema_changes:
      enabled: true
    volume_anomalies:
      enabled: true
      anomaly_sensitivity: 3

Run your normal dbt run — Elementary automatically collects metadata via post-hooks. Then generate a report:

1
2
edr monitor --target prod
edr report --output-path ./report

Strengths

  • Zero-configuration monitoring — if you use dbt, Elementary detects anomalies without writing any test definitions
  • Built-in web UI — the self-hosted dashboard shows model health, test results, and lineage in one place
  • dbt lineage integration — automatically maps which upstream models are causing downstream failures

Soda Core: Data Contracts and Reliability

Soda Core is a data reliability engine that lets you define “data contracts” — explicit expectations about what your data should look like — and check them against actual data. It integrates with CI/CD pipelines and orchestration tools.

Installation

Soda Core is a Python package:

1
pip install soda-core-postgres soda-core-spark-df

Or use it in a Docker container:

1
2
3
4
5
docker run --rm \
  -v $(pwd)/checks.yml:/checks/tables/checks.yml \
  -v $(pwd)/configuration.yml:/checks/configuration.yml \
  sodadata/soda-core:latest \
  scan -d my_datasource -c configuration.yml checks/

Configuration

Create a configuration.yml for your data source:

1
2
3
4
5
6
7
8
data_source my_postgres:
  type: postgres
  connection:
    host: localhost
    port: "5432"
    username: soda_user
    password: soda_secret
    database: analytics_db

Define checks in checks.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
checks for orders:
  - freshness:
      column: created_at
      fail: when > 24h
      warn: when > 12h
  - row_count:
      fail: when < 100
      warn: when < 500
  - schema:
      fail:
        when wrong column names:
          - id
          - customer_id
          - amount
          - status
          - created_at
  - missing_count(customer_id):
      fail: when > 0
  - duplicate_count(id):
      fail: when > 0

Run a scan:

1
soda scan -d my_postgres -c configuration.yml checks.yml

Soda outputs results to stdout (for CI/CD) or can push to Soda Cloud for dashboards.

Strengths

  • Declarative check definitions — YAML-based checks are readable and version-controllable
  • CI/CD integration — scans fail pipelines when data contracts are violated
  • Wide database support — works with PostgreSQL, Snowflake, BigQuery, Spark, and DuckDB out of the box

Great Expectations: Statistical Data Profiling

Great Expectations is the most mature open-source data validation framework. While often used for data quality testing, its profiling and monitoring capabilities make it a powerful observability tool.

Installation

1
pip install great_expectations[postgresql]

Initialize a project:

1
great_expectations init

Configuration

Create an Expectation Suite that defines what “good” data looks like:

1
great_expectations suite new --suite_name orders_suite

Define expectations programmatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import great_expectations as gx

context = gx.get_context()
validator = context.get_validator(
    datasource_name="my_postgres",
    data_connector_name="default_inferred_data_connector",
    data_asset_name="orders",
    expectation_suite_name="orders_suite"
)

# Freshness check
validator.expect_column_values_to_be_between(
    column="created_at",
    min_value="2026-04-20",
    mostly=0.95
)

# Volume check
validator.expect_table_row_count_to_be_between(
    min_value=100,
    max_value=1000000
)

# Schema check
validator.expect_table_columns_to_match_ordered_list(
    column_list=["id", "customer_id", "amount", "status", "created_at"]
)

validator.save_expectation_suite()

Run validation and build Data Docs:

1
2
great_expectations checkpoint run orders_checkpoint
great_expectations docs build

Data Docs generates a static HTML site you can host on any web server or S3 bucket, showing pass/fail status for every expectation.

Strengths

  • Most comprehensive expectation library — 100+ built-in expectations covering statistical, structural, and distributional checks
  • Data Docs — beautiful, self-hostable HTML reports that non-technical stakeholders can read
  • Large ecosystem — integrations with Airflow, dbt, Spark, Pandas, and most major orchestration tools

Head-to-Head: Which Tool Should You Choose?

CriterionElementarySoda CoreGreat Expectations
Best Fordbt-centric teamsCI/CD-integrated data contractsComprehensive validation
Setup TimeMinutes (if using dbt)MinutesHours (steep learning curve)
Anomaly DetectionAutomatic (no config)Manual threshold definitionManual expectation definition
UI/DashboardBuilt-in web appNone (CLI only)Static HTML (Data Docs)
AlertingSlack, email, webhookCI/CD exit codesCustom callbacks
Learning CurveLowLow-MediumHigh
Active DevelopmentVery activeActiveVery active (largest community)

Decision Matrix

Choose Elementary if:

  • You already use dbt as your transformation layer
  • You want automatic anomaly detection without writing test definitions
  • You need a built-in web dashboard for monitoring

Choose Soda Core if:

  • You want to enforce data contracts in CI/CD pipelines
  • You prefer declarative YAML check definitions
  • You need fast setup with minimal infrastructure

Choose Great Expectations if:

  • You need the broadest range of data validation capabilities
  • You want beautiful, stakeholder-facing Data Docs
  • You have complex validation requirements beyond freshness and schema checks

Deployment Patterns for Self-Hosted Stacks

A typical self-hosted observability deployment combines these tools with existing infrastructure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
┌─────────────────────────────────────────────────┐
│              Data Observability                  │
├──────────────┬──────────────┬───────────────────┤
│  Elementary   │  Soda Core   │   Great Exp.      │
│  (dbt hooks)  │  (CI/CD)     │   (Checkpoints)   │
└──────┬───────┴──────┬───────┴────────┬──────────┘
       │              │                 │
       ▼              ▼                 ▼
┌──────────────────────────────────────────────┐
│          Data Warehouse / Lake               │
│     PostgreSQL / DuckDB / BigQuery           │
└──────────────────────────────────────────────┘
┌──────┴───────┐  ┌──────────────┐
│  Orchestration│  │  dbt Transform│
│  (Airflow)    │  │  (dbt Core)   │
└──────────────┘  └──────────────┘

Scheduled Monitoring with Cron

For tools without built-in schedulers, use cron to run periodic checks:

1
2
3
4
5
# Run Soda scan every 15 minutes
*/15 * * * * cd /opt/data-observability && soda scan -d my_postgres -c configuration.yml checks.yml >> /var/log/soda-scan.log 2>&1

# Run Great Expectations checkpoint hourly
0 * * * * great_expectations checkpoint run orders_checkpoint --no_usage_stats

Alerting via ntfy

Pipe scan results to a self-hosted notification service:

1
2
3
4
5
6
# Soda scan with ntfy alert
soda scan -d my_postgres -c configuration.yml checks.yml 2>&1 | \
  grep -q "FAIL" && \
  curl -H "Title: Data Quality Alert" \
       -d "Soda scan detected FAIL conditions in orders table" \
       http://localhost:8080/data-observability

For more on self-hosted push notifications, see our Gotify vs ntfy guide.

FAQ

What is data observability and how does it differ from data quality testing?

Data observability is the practice of continuously monitoring data pipelines for health signals like freshness, volume, schema changes, and distribution anomalies. Data quality testing validates that data meets specific business rules at a point in time. Observability is proactive and continuous; testing is reactive and scheduled. Think of observability as your dashboard warning lights, and testing as your pre-flight checklist.

Can I run these tools without dbt?

Elementary requires dbt for its core monitoring features (it uses dbt post-hooks to collect metadata). Soda Core and Great Expectations work independently of dbt — they connect directly to your data warehouse or lake and run checks on any table or query.

Do these tools replace data quality testing?

No. Observability tools complement testing. Elementary and Soda Core focus on detecting when something has changed or gone wrong (anomalies, freshness failures). Great Expectations does both — it can run anomaly detection AND enforce business rule validations. For comprehensive coverage, use observability for automated monitoring and testing frameworks for business-rule validation.

Which tool is easiest to set up for a small team?

Soda Core has the lowest barrier to entry: install the pip package, write a configuration.yml and checks.yml, and run soda scan. You can go from zero to monitoring in under 30 minutes. Elementary is equally fast if you already have dbt running — just add the package and configure monitors. Great Expectations requires more setup (project initialization, suite creation, checkpoint configuration) but offers the most flexibility once configured.

Can these tools monitor streaming data?

Great Expectations supports streaming validation via Spark Structured Streaming and Pandas on micro-batches. Soda Core can run on Spark DataFrames for near-real-time checks. Elementary is primarily designed for batch data via dbt runs and does not natively support streaming pipelines.

How do I host the monitoring dashboards?

Elementary includes a built-in web application (deployed via Docker Compose) that serves its dashboard on port 3000. Great Expectations generates static HTML Data Docs that you can host on any web server, S3 bucket, or internal wiki. Soda Core does not include a UI — results are output to the console or integrated into CI/CD pipeline logs.

Are these tools production-ready for large data warehouses?

Yes. All three tools are used in production at companies processing billions of rows. Great Expectations has the longest track record and largest community. Elementary is optimized for dbt workflows and scales with your dbt project. Soda Core is designed for CI/CD integration and can run checks on tables of any size (though scan time increases with data volume).

Advertise here
Advertise here