← Back to posts
comparison guide self-hosted · · 11 min read

ClickHouse vs Apache Druid vs Apache Pinot: Best Self-Hosted Analytics Database 2026

A detailed comparison of ClickHouse, Apache Druid, and Apache Pinot as self-hosted real-time analytics database alternatives to commercial solutions like Snowflake, BigQuery, and Elasticsearch.

OS
Editorial Team

Self-hosting your analytics infrastructure gives you full control over your data, eliminates vendor lock-in, and dramatically reduces costs at scale. When it comes to running real-time analytical queries on large datasets, three open-source databases stand out: ClickHouse, Apache Druid, and Apache Pinot.

Each of these systems is designed for fast OLAP (Online Analytical Processing) workloads, but they differ significantly in architecture, use cases, and operational complexity. This guide breaks down thdockererences, provides Docker-based setup instructions, and helps you choose the right tool for your stack.

Why Self-Host Your Analytics Database

Commercial analytics platforms like Snowflake, Google BigQuery, and Amazon Redshift are powerful but come with significant trade-offs:

  • Escalating costs — Query-based pricing means costs grow unpredictably with usage. A team running hundreds of dashboards can easily spend $5,000–$50,000+ per month.
  • Data sovereignty — Sending user data to third-party clouds raises compliance issues under GDPR, HIPAA, and SOC 2.
  • Vendor lock-in — Migrating petabytes of data out of a cloud warehouse is expensive and time-consuming.
  • Latency constraints — Real-time ingestion and sub-second queries are difficult to achieve with cloud-only architectures that route through multiple network hops.

Self-hosting solves these problems. You own the hardware (or rent it predictably), data never leaves your infrastructure, and query performance depends on your configuration — not someone else’s shared resources.

Open-source OLAP databases like ClickHouse, Druid, and Pinot are proven at massive scale. Cloudflare uses ClickHouse to process trillions of requests. Uber and Netflix run Druid for real-time analytics. LinkedIn relies on Pinot for user-facing analytics with strict SLAs.

Architecture at a Glance

Understanding how each system stores and processes data is key to picking the right one.

ClickHouse

ClickHouse is a column-oriented database management system originally developed by Yandex. Its core design principles are:

  • Columnar storage with heavy compression (data is often 5–10x smaller than raw)
  • Vectorized query execution — processes data in batches using SIMD CPU instructions
  • MergeTree family of table engines for efficient inserts and background merges
  • Single binary deployment — simple to run, no external dependencies
  • SQL-native — uses a SQL dialect very close to standard SQL

ClickHouse shines when you need fast aggregation queries on large datasets with relatively simple ingestion patterns.

Apache Druid

Apache Druid is a distributed, column-oriented data store originally built at Metamarkets. Its architecture is built around:

  • Immutable data segments stored in deep storage (S3, HDFS) with a local cache
  • Real-time ingestion via Kafka streams or batch ingestion from files
  • Broker/Historical/MiddleManager/Coordinator node separation for horizontal scaling
  • Approximate algorithms (theta sketches, HLL) for fast cardinality estimation
  • Time-first indexing — optimized for time-series and event data

Druid excels at real-time dashboards where data arrives continuously and you need sub-second responses on time-based queries.

Apache Pinot

Apache Pinot was built at LinkedIn specifically for low-latency analytics on user-facing applications. Its architecture features:

  • Immutable segments with real-time and offline serving paths
  • Multiple index types — inverted, star-tree, sorted, range, JSON, and geospatial
  • Controller/Server/Broker separation with ZooKeeper for coordination
  • Upsert support — handles late-arriving and corrected data better than Druid
  • Built-in ingestion from Kafka, Hadoop, and local files with automatic schema evolution

Pinot is the best choice when you need sub-100ms queries on user-facing dashboards with complex filtering requirements.

Feature Comparison

FeatureClickHouseApache DruidApache Pinot
Primary Use CaseGeneral analytics, log analysisReal-time dashboards, event streamingUser-facing analytics, low-latency serving
Query LanguageSQL (extended)SQL (limited)SQL (Pinot SQL)
Real-Time IngestionYes (via Kafka engine, materialized views)Yes (native, first-class)Yes (native, with upsert support)
LatencySub-second to secondsSub-secondSub-100ms (p99)
Storage BackendLocal disk (distributed via ReplicatedMergeTree)Deep storage (S3/HDFS) + local cacheLocal disk + deep storage (optional)
Horizontal ScalingYes (sharding + replication)Yes (segment-based)Yes (server-based)
Approximate QueriesYes (GROUP BY with sampling)Yes (sketches, quantiles)Yes (DISTINCTCOUNT, percentile approximations)
JoinsFull SQL JOINsLimited (broadcast joins only)Limited (dimension tables only)
Upsert/MutationYes (lightweight deletes, mutations)No (immutable segments)Yes (native upsert)
ComplexityLow (single binary)High (4 node types + ZooKeeper)Medium (3 node types + ZooKeeper)
CompressionExcellent (LZ4, ZSTD)Good (LZ4)Good (LZ4)
Community / Stars45k+ GitHub stars13k+ GitHub stars6k+ GitHub stars
LicenseApache 2.0Apache 2.0Apache 2.0

Docker Setup: Getting Started in Minutes

ClickHouse (Simplest Setup)

ClickHouse is the easiest to get running. A single container is all you need for development and small-scale production.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# docker-compose.yml
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.8
    container_name: clickhouse
    ports:
      - "8123:8123"   # HTTP interface
      - "9000:9000"   # Native protocol
    volumes:
      - clickhouse_data:/var/lib/clickhouse
    environment:
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse_data:

networks:
  default:
    name: analytics
1
docker compose up -d

Once running, access the web interface at http://localhost:8123/play and run SQL queries directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE TABLE events (
    timestamp DateTime64(3),
    event_type LowCardinality(String),
    user_id UInt64,
    properties String,
    country LowCardinality(String)
) ENGINE = MergeTree()
ORDER BY (event_type, timestamp);

-- Insert sample data
INSERT INTO events VALUES
    (now(), 'page_view', 1001, '{"page": "/home"}', 'US'),
    (now(), 'click', 1001, '{"button": "signup"}', 'US'),
    (now(), 'page_view', 1002, '{"page": "/pricing"}', 'DE');

-- Fast aggregation
SELECT
    country,
    event_type,
    count() AS total,
    uniq(user_id) AS unique_users
FROM events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY country, event_type
ORDER BY total DESC;

For Kafka ingestion, ClickHouse provides a built-in Kafka table engine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE kafka_events (
    timestamp DateTime64(3),
    event_type String,
    user_id UInt64,
    properties String,
    country String
) ENGINE = Kafka('kafka:9092', 'events-topic', 'clickhouse-consumer-group')
SETTINGS kafka_format = 'JSONEachRow';

CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM kafka_events;

Apache Druid

Druid requires more components but Docker Compose makes it manageable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# docker-compose.yml
services:
  zookeeper:
    image: zookeeper:3.8
    container_name: druid-zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: 1

  postgres:
    image: postgres:16
    container_name: druid-postgres
    environment:
      POSTGRES_PASSWORD: FoolishPassword
      POSTGRES_USER: druid
      POSTGRES_DB: druid

  coordinator-overlord:
    image: apache/druid:31.0.0
    container_name: druid-coordinator
    depends_on:
      - zookeeper
      - postgres
    volumes:
      - ./druid/conf:/opt/druid/conf
    ports:
      - "8081:8081"
    command:
      - coordinator-overlord
    environment:
      DRUID_PORT: 8081

  broker:
    image: apache/druid:31.0.0
    container_name: druid-broker
    depends_on:
      - zookeeper
      - postgres
    ports:
      - "8082:8082"
    command:
      - broker
    environment:
      DRUID_PORT: 8082

  historical:
    image: apache/druid:31.0.0
    container_name: druid-historical
    depends_on:
      - zookeeper
      - postgres
    ports:
      - "8083:8083"
    command:
      - historical
    environment:
      DRUID_PORT: 8083

  middlemanager:
    image: apache/druid:31.0.0
    container_name: druid-middle
    depends_on:
      - zookeeper
      - postgres
    ports:
      - "8091:8091"
    command:
      - middlemanager
    environment:
      DRUID_PORT: 8091

Druid includes a web console at http://localhost:8888 for managing data sources, tasks, and running SQL queries.

Ingest data via the console UI using the guided workflow, or POST an ingestion spec:

1
2
3
curl -X 'POST' -H 'Content-Type: application/json' \
  -d @ingestion-spec.json \
  http://localhost:8081/druid/indexer/v1/task

Apache Pinot

Pinot also uses ZooKeeper for coordination:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# docker-compose.yml
services:
  zookeeper:
    image: zookeeper:3.9
    container_name: pinot-zookeeper
    ports:
      - "2181:2181"

  pinot-controller:
    image: apachepinot/pinot:1.2.0
    container_name: pinot-controller
    ports:
      - "9000:9000"
    command:
      - StartController
      - "-zkAddress"
      - "zookeeper:2181"
    environment:
      JAVA_OPTS: "-Xms256m -Xmx512m"

  pinot-server:
    image: apachepinot/pinot:1.2.0
    container_name: pinot-server
    ports:
      - "8099:8099"
    command:
      - StartServer
      - "-zkAddress"
      - "zookeeper:2181"
    depends_on:
      - pinot-controller
    environment:
      JAVA_OPTS: "-Xms256m -Xmx512m"

  pinot-broker:
    image: apachepinot/pinot:1.2.0
    container_name: pinot-broker
    ports:
      - "7070:7070"
    command:
      - StartBroker
      - "-zkAddress"
      - "zookeeper:2181"
    depends_on:
      - pinot-controller
    environment:
      JAVA_OPTS: "-Xms256m -Xmx512m"

Pinot’s web console is available at http://localhost:9000. Create a table schema and upload data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Create a schema
cat > events-schema.json << 'EOF'
{
  "schemaName": "events",
  "dimensionFieldSpecs": [
    {"name": "event_type", "dataType": "STRING"},
    {"name": "user_id", "dataType": "LONG"},
    {"name": "country", "dataType": "STRING"}
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestamp",
    "dataType": "TIMESTAMP",
    "format": "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}
EOF

# Create table config
cat > events-table.json << 'EOF'
{
  "tableName": "events",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "schemaName": "events",
    "replication": "1"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfig": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "events-topic",
      "stream.kafka.broker.list": "kafka:9092",
      "stream.kafka.consumer.factory.class.name": "LowLevelConsumerFactory"
    }
  }
}
EOF

# Create the table
curl -X POST "http://localhost:9000/tables" \
  -F "schemaFile=@events-schema.json" \
  -F "tableConfigFile=@events-table.json"

Query via the Pinot SQL API:

1
2
3
curl -X POST "http://localhost:8000/query/sql" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT event_type, COUNT(*) FROM events GROUP BY event_type"}'

Performance Benchmarks

Performance depends heavily on your workload, but here are representative results from independent benchmarks on similar hardware (8 vCPU, 32 GB RAM, 500 GB NVMe SSD):

BenchmarkClickHouseApache DruidApache Pinot
10B row SUM() with GROUP BY0.8s1.4s1.1s
Distinct count on 500M rows1.2s0.6s (approx)0.9s (approx)
Real-time ingestion (events/s)500K–2M1–5M1–3M
p99 query latency (simple filter)50–200ms100–500ms10–100ms
Storage (10B events, compressed)120 GB200 GB180 GB
Complex JOIN (5 tables)2.5sN/A (unsupported)N/A (unsupported)

Key takeaways:

  • ClickHouse has the best raw compute performance for batch-style aggregations and supports full SQL JOINs.
  • Druid leads on ingestion throughput and approximate queries.
  • Pinot delivers the lowest p99 latency for simple filter queries on user-facing dashboards.

When to Choose Which

Choose ClickHouse if:

  • You need a general-purpose analytics database with full SQL support
  • Your queries involve JOINs, subqueries, or complex aggregations
  • You want the simplest operational setup (single binary, minimal configuration)
  • You’re replacing Elasticsearch for log analysis or time-series data
  • Your team already knows SQL and doesn’t want to learn a specialized query language
  • You need excellent compression to minimize storage costs

Choose Apache Druid if:

  • You have continuous real-time data streams (Kafka, Kinesis)
  • Your dashboards are heavily time-series focused
  • You need approximate algorithms (cardinality, quantiles, top-N) at massive scale
  • Your ingestion rate is extremely high (millions of events per second)
  • You want separation of compute (Brokers) and storage (Historicals) for independent scaling
  • You’re building internal analytics dashboards for operations teams

Choose Apache Pinot if:

  • You need sub-100ms query latency for user-facing applications
  • Your data requires upserts (correcting or updating existing records)
  • You have complex filtering needs (multi-value, geospatial, JSON predicates)
  • You’re building customer-facing analytics (like LinkedIn’s “Who Viewed Your Profile”)
  • You need strong SLAs with predictable p99 latency
  • You want a balance between Druid’s real-time capabilities and simpler operations

Ecosystem and Integrations

All three databases integrate with popular visualization tools:

ToolClickHouseDruidPinot
GrafanaNative pluginNative pluginCommunity plugin
SupersetNative (via SQLAlchemy)NativeNative
MetabaseCommunity driverLimited supportLimited support
TableauODBC/JDBC connectorJDBC connectorODBC connector
dbtOfficial adapterCommunity adapterCommunity adapter

For data ingestion, all three support Apache Kafka as the primary streaming source. ClickHouse also has native table engines for PostgreSQL, MySQL, MongoDB, S3, and HTTP endpoints. Druid and Pinot offer batch ingestion from Parquet, ORC, and CSV files through their respective ingestion frameworks.

Migration Tips

Moving from a commercial warehouse to a self-hosted OLAP database requires planning:

  1. Audit your queries — Identify which SQL features you use most. If you rely heavily on window functions and CTEs, ClickHouse is your safest migration target.
  2. Benchmark with your data — Export a representative sample (1–10 GB) and run your most expensive queries on each candidate.
  3. Start with read replicas — Run your new database alongside the old one, shadow production queries, and compare results before cutover.
  4. Plan ingestion pipelines — Replace your existing ETL jobs with streaming ingestion (Kafka) or scheduled batch loads. ClickHouse’s materialized views can often replace complexprometheusforms.
  5. Set up monitoring — Deploy Prometheus + Grafana for all three. ClickHouse’s system tables, Druid’s built-in metrics endpoint, and Pinot’s JMX metrics all integrate seamlessly.

Conclusion

ClickHouse, Apache Druid, and Apache Pinot are all production-grade, open-source analytics databases that can replace expensive commercial alternatives. The choice comes down to your specific requirements:

  • ClickHouse for simplicity, SQL compatibility, and all-around performance
  • Apache Druid for high-throughput real-time streaming and approximate analytics
  • Apache Pinot for ultra-low-latency user-facing queries with upsert support

All three are Apache 2.0 licensed, actively maintained, and backed by vibrant communities with commercial support options available. Start with a Docker Compose setup, benchmark with your actual data, and you’ll find the right fit for your analytics stack.

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

The best choice depends on your specific requirements:

  • For beginners: Start with the simplest option that covers your core use case
  • For production: Choose the solution with the most active community and documentation
  • For teams: Look for collaboration features and user management
  • For privacy: Prefer fully open-source, self-hosted options with no telemetry

Refer to the comparison table above for detailed feature breakdowns.

Can I migrate between these tools?

Most tools support data import/export. Always:

  1. Backup your current data
  2. Test the migration on a staging environment
  3. Check official migration guides in the documentation

Are there free versions available?

All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.

How do I get started?

  1. Review the comparison table to identify your requirements
  2. Visit the official documentation (links provided above)
  3. Start with a Docker Compose setup for easy testing
  4. Join the community forums for troubleshooting
Advertise here