ClickHouse vs Apache Druid vs Apache Pinot: Best Self-Hosted Analytics Database 2026

Self-hosting your analytics infrastructure gives you full control over your data, eliminates vendor lock-in, and dramatically reduces costs at scale. When it comes to running real-time analytical queries on large datasets, three open-source databases stand out: ClickHouse, Apache Druid, and Apache Pinot.

Each of these systems is designed for fast OLAP (Online Analytical Processing) workloads, but they differ significantly in architecture, use cases, and operational complexity. This guide breaks down thdockererences, provides Docker-based setup instructions, and helps you choose the right tool for your stack.

Why Self-Host Your Analytics Database

Commercial analytics platforms like Snowflake, Google BigQuery, and Amazon Redshift are powerful but come with significant trade-offs:

Escalating costs — Query-based pricing means costs grow unpredictably with usage. A team running hundreds of dashboards can easily spend $5,000–$50,000+ per month.
Data sovereignty — Sending user data to third-party clouds raises compliance issues under GDPR, HIPAA, and SOC 2.
Vendor lock-in — Migrating petabytes of data out of a cloud warehouse is expensive and time-consuming.
Latency constraints — Real-time ingestion and sub-second queries are difficult to achieve with cloud-only architectures that route through multiple network hops.

Self-hosting solves these problems. You own the hardware (or rent it predictably), data never leaves your infrastructure, and query performance depends on your configuration — not someone else’s shared resources.

Open-source OLAP databases like ClickHouse, Druid, and Pinot are proven at massive scale. Cloudflare uses ClickHouse to process trillions of requests. Uber and Netflix run Druid for real-time analytics. LinkedIn relies on Pinot for user-facing analytics with strict SLAs.

Architecture at a Glance

Understanding how each system stores and processes data is key to picking the right one.

ClickHouse

ClickHouse is a column-oriented database management system originally developed by Yandex. Its core design principles are:

Columnar storage with heavy compression (data is often 5–10x smaller than raw)
Vectorized query execution — processes data in batches using SIMD CPU instructions
MergeTree family of table engines for efficient inserts and background merges
Single binary deployment — simple to run, no external dependencies
SQL-native — uses a SQL dialect very close to standard SQL

ClickHouse shines when you need fast aggregation queries on large datasets with relatively simple ingestion patterns.

Apache Druid

Apache Druid is a distributed, column-oriented data store originally built at Metamarkets. Its architecture is built around:

Immutable data segments stored in deep storage (S3, HDFS) with a local cache
Real-time ingestion via Kafka streams or batch ingestion from files
Broker/Historical/MiddleManager/Coordinator node separation for horizontal scaling
Approximate algorithms (theta sketches, HLL) for fast cardinality estimation
Time-first indexing — optimized for time-series and event data

Druid excels at real-time dashboards where data arrives continuously and you need sub-second responses on time-based queries.

Apache Pinot

Apache Pinot was built at LinkedIn specifically for low-latency analytics on user-facing applications. Its architecture features:

Immutable segments with real-time and offline serving paths
Multiple index types — inverted, star-tree, sorted, range, JSON, and geospatial
Controller/Server/Broker separation with ZooKeeper for coordination
Upsert support — handles late-arriving and corrected data better than Druid
Built-in ingestion from Kafka, Hadoop, and local files with automatic schema evolution

Pinot is the best choice when you need sub-100ms queries on user-facing dashboards with complex filtering requirements.

Feature Comparison

Feature	ClickHouse	Apache Druid	Apache Pinot
Primary Use Case	General analytics, log analysis	Real-time dashboards, event streaming	User-facing analytics, low-latency serving
Query Language	SQL (extended)	SQL (limited)	SQL (Pinot SQL)
Real-Time Ingestion	Yes (via Kafka engine, materialized views)	Yes (native, first-class)	Yes (native, with upsert support)
Latency	Sub-second to seconds	Sub-second	Sub-100ms (p99)
Storage Backend	Local disk (distributed via ReplicatedMergeTree)	Deep storage (S3/HDFS) + local cache	Local disk + deep storage (optional)
Horizontal Scaling	Yes (sharding + replication)	Yes (segment-based)	Yes (server-based)
Approximate Queries	Yes (GROUP BY with sampling)	Yes (sketches, quantiles)	Yes (DISTINCTCOUNT, percentile approximations)
Joins	Full SQL JOINs	Limited (broadcast joins only)	Limited (dimension tables only)
Upsert/Mutation	Yes (lightweight deletes, mutations)	No (immutable segments)	Yes (native upsert)
Complexity	Low (single binary)	High (4 node types + ZooKeeper)	Medium (3 node types + ZooKeeper)
Compression	Excellent (LZ4, ZSTD)	Good (LZ4)	Good (LZ4)
Community / Stars	45k+ GitHub stars	13k+ GitHub stars	6k+ GitHub stars
License	Apache 2.0	Apache 2.0	Apache 2.0

Docker Setup: Getting Started in Minutes

ClickHouse (Simplest Setup)

ClickHouse is the easiest to get running. A single container is all you need for development and small-scale production.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# docker-compose.yml
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.8
    container_name: clickhouse
    ports:
      - "8123:8123"   # HTTP interface
      - "9000:9000"   # Native protocol
    volumes:
      - clickhouse_data:/var/lib/clickhouse
    environment:
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse_data:

networks:
  default:
    name: analytics

1
docker compose up -d

Once running, access the web interface at http://localhost:8123/play and run SQL queries directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE TABLE events (
    timestamp DateTime64(3),
    event_type LowCardinality(String),
    user_id UInt64,
    properties String,
    country LowCardinality(String)
) ENGINE = MergeTree()
ORDER BY (event_type, timestamp);

-- Insert sample data
INSERT INTO events VALUES
    (now(), 'page_view', 1001, '{"page": "/home"}', 'US'),
    (now(), 'click', 1001, '{"button": "signup"}', 'US'),
    (now(), 'page_view', 1002, '{"page": "/pricing"}', 'DE');

-- Fast aggregation
SELECT
    country,
    event_type,
    count() AS total,
    uniq(user_id) AS unique_users
FROM events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY country, event_type
ORDER BY total DESC;

For Kafka ingestion, ClickHouse provides a built-in Kafka table engine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE kafka_events (
    timestamp DateTime64(3),
    event_type String,
    user_id UInt64,
    properties String,
    country String
) ENGINE = Kafka('kafka:9092', 'events-topic', 'clickhouse-consumer-group')
SETTINGS kafka_format = 'JSONEachRow';

CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM kafka_events;

Apache Druid

Druid requires more components but Docker Compose makes it manageable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# docker-compose.yml
services:
  zookeeper:
    image: zookeeper:3.8
    container_name: druid-zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: 1

  postgres:
    image: postgres:16
    container_name: druid-postgres
    environment:
      POSTGRES_PASSWORD: FoolishPassword
      POSTGRES_USER: druid
      POSTGRES_DB: druid

  coordinator-overlord:
    image: apache/druid:31.0.0
    container_name: druid-coordinator
    depends_on:
      - zookeeper
      - postgres
    volumes:
      - ./druid/conf:/opt/druid/conf
    ports:
      - "8081:8081"
    command:
      - coordinator-overlord
    environment:
      DRUID_PORT: 8081

  broker:
    image: apache/druid:31.0.0
    container_name: druid-broker
    depends_on:
      - zookeeper
      - postgres
    ports:
      - "8082:8082"
    command:
      - broker
    environment:
      DRUID_PORT: 8082

  historical:
    image: apache/druid:31.0.0
    container_name: druid-historical
    depends_on:
      - zookeeper
      - postgres
    ports:
      - "8083:8083"
    command:
      - historical
    environment:
      DRUID_PORT: 8083

  middlemanager:
    image: apache/druid:31.0.0
    container_name: druid-middle
    depends_on:
      - zookeeper
      - postgres
    ports:
      - "8091:8091"
    command:
      - middlemanager
    environment:
      DRUID_PORT: 8091

Druid includes a web console at http://localhost:8888 for managing data sources, tasks, and running SQL queries.

Ingest data via the console UI using the guided workflow, or POST an ingestion spec:

1
2
3
curl -X 'POST' -H 'Content-Type: application/json' \
  -d @ingestion-spec.json \
  http://localhost:8081/druid/indexer/v1/task

Apache Pinot

Pinot also uses ZooKeeper for coordination:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# docker-compose.yml
services:
  zookeeper:
    image: zookeeper:3.9
    container_name: pinot-zookeeper
    ports:
      - "2181:2181"

  pinot-controller:
    image: apachepinot/pinot:1.2.0
    container_name: pinot-controller
    ports:
      - "9000:9000"
    command:
      - StartController
      - "-zkAddress"
      - "zookeeper:2181"
    environment:
      JAVA_OPTS: "-Xms256m -Xmx512m"

  pinot-server:
    image: apachepinot/pinot:1.2.0
    container_name: pinot-server
    ports:
      - "8099:8099"
    command:
      - StartServer
      - "-zkAddress"
      - "zookeeper:2181"
    depends_on:
      - pinot-controller
    environment:
      JAVA_OPTS: "-Xms256m -Xmx512m"

  pinot-broker:
    image: apachepinot/pinot:1.2.0
    container_name: pinot-broker
    ports:
      - "7070:7070"
    command:
      - StartBroker
      - "-zkAddress"
      - "zookeeper:2181"
    depends_on:
      - pinot-controller
    environment:
      JAVA_OPTS: "-Xms256m -Xmx512m"

Pinot’s web console is available at http://localhost:9000. Create a table schema and upload data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Create a schema
cat > events-schema.json << 'EOF'
{
  "schemaName": "events",
  "dimensionFieldSpecs": [
    {"name": "event_type", "dataType": "STRING"},
    {"name": "user_id", "dataType": "LONG"},
    {"name": "country", "dataType": "STRING"}
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestamp",
    "dataType": "TIMESTAMP",
    "format": "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}
EOF

# Create table config
cat > events-table.json << 'EOF'
{
  "tableName": "events",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "schemaName": "events",
    "replication": "1"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfig": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "events-topic",
      "stream.kafka.broker.list": "kafka:9092",
      "stream.kafka.consumer.factory.class.name": "LowLevelConsumerFactory"
    }
  }
}
EOF

# Create the table
curl -X POST "http://localhost:9000/tables" \
  -F "schemaFile=@events-schema.json" \
  -F "tableConfigFile=@events-table.json"

Query via the Pinot SQL API:

1
2
3
curl -X POST "http://localhost:8000/query/sql" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT event_type, COUNT(*) FROM events GROUP BY event_type"}'

Performance Benchmarks

Performance depends heavily on your workload, but here are representative results from independent benchmarks on similar hardware (8 vCPU, 32 GB RAM, 500 GB NVMe SSD):

Benchmark	ClickHouse	Apache Druid	Apache Pinot
10B row SUM() with GROUP BY	0.8s	1.4s	1.1s
Distinct count on 500M rows	1.2s	0.6s (approx)	0.9s (approx)
Real-time ingestion (events/s)	500K–2M	1–5M	1–3M
p99 query latency (simple filter)	50–200ms	100–500ms	10–100ms
Storage (10B events, compressed)	120 GB	200 GB	180 GB
Complex JOIN (5 tables)	2.5s	N/A (unsupported)	N/A (unsupported)

Key takeaways:

ClickHouse has the best raw compute performance for batch-style aggregations and supports full SQL JOINs.
Druid leads on ingestion throughput and approximate queries.
Pinot delivers the lowest p99 latency for simple filter queries on user-facing dashboards.

When to Choose Which

Choose ClickHouse if:

You need a general-purpose analytics database with full SQL support
Your queries involve JOINs, subqueries, or complex aggregations
You want the simplest operational setup (single binary, minimal configuration)
You’re replacing Elasticsearch for log analysis or time-series data
Your team already knows SQL and doesn’t want to learn a specialized query language
You need excellent compression to minimize storage costs

Choose Apache Druid if:

You have continuous real-time data streams (Kafka, Kinesis)
Your dashboards are heavily time-series focused
You need approximate algorithms (cardinality, quantiles, top-N) at massive scale
Your ingestion rate is extremely high (millions of events per second)
You want separation of compute (Brokers) and storage (Historicals) for independent scaling
You’re building internal analytics dashboards for operations teams

Choose Apache Pinot if:

You need sub-100ms query latency for user-facing applications
Your data requires upserts (correcting or updating existing records)
You have complex filtering needs (multi-value, geospatial, JSON predicates)
You’re building customer-facing analytics (like LinkedIn’s “Who Viewed Your Profile”)
You need strong SLAs with predictable p99 latency
You want a balance between Druid’s real-time capabilities and simpler operations

Ecosystem and Integrations

All three databases integrate with popular visualization tools:

Tool	ClickHouse	Druid	Pinot
Grafana	Native plugin	Native plugin	Community plugin
Superset	Native (via SQLAlchemy)	Native	Native
Metabase	Community driver	Limited support	Limited support
Tableau	ODBC/JDBC connector	JDBC connector	ODBC connector
dbt	Official adapter	Community adapter	Community adapter

For data ingestion, all three support Apache Kafka as the primary streaming source. ClickHouse also has native table engines for PostgreSQL, MySQL, MongoDB, S3, and HTTP endpoints. Druid and Pinot offer batch ingestion from Parquet, ORC, and CSV files through their respective ingestion frameworks.

Migration Tips

Moving from a commercial warehouse to a self-hosted OLAP database requires planning:

Audit your queries — Identify which SQL features you use most. If you rely heavily on window functions and CTEs, ClickHouse is your safest migration target.
Benchmark with your data — Export a representative sample (1–10 GB) and run your most expensive queries on each candidate.
Start with read replicas — Run your new database alongside the old one, shadow production queries, and compare results before cutover.
Plan ingestion pipelines — Replace your existing ETL jobs with streaming ingestion (Kafka) or scheduled batch loads. ClickHouse’s materialized views can often replace complexprometheusforms.
Set up monitoring — Deploy Prometheus + Grafana for all three. ClickHouse’s system tables, Druid’s built-in metrics endpoint, and Pinot’s JMX metrics all integrate seamlessly.

Conclusion

ClickHouse, Apache Druid, and Apache Pinot are all production-grade, open-source analytics databases that can replace expensive commercial alternatives. The choice comes down to your specific requirements:

ClickHouse for simplicity, SQL compatibility, and all-around performance
Apache Druid for high-throughput real-time streaming and approximate analytics
Apache Pinot for ultra-low-latency user-facing queries with upsert support

All three are Apache 2.0 licensed, actively maintained, and backed by vibrant communities with commercial support options available. Start with a Docker Compose setup, benchmark with your actual data, and you’ll find the right fit for your analytics stack.

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

The best choice depends on your specific requirements:

For beginners: Start with the simplest option that covers your core use case
For production: Choose the solution with the most active community and documentation
For teams: Look for collaboration features and user management
For privacy: Prefer fully open-source, self-hosted options with no telemetry

Refer to the comparison table above for detailed feature breakdowns.

Can I migrate between these tools?

Most tools support data import/export. Always:

Backup your current data
Test the migration on a staging environment
Check official migration guides in the documentation

Are there free versions available?

All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.

How do I get started?

Review the comparison table to identify your requirements
Visit the official documentation (links provided above)
Start with a Docker Compose setup for easy testing
Join the community forums for troubleshooting

Why Self-Host Your Analytics Database

Architecture at a Glance

ClickHouse

Apache Druid

Apache Pinot

Feature Comparison

Docker Setup: Getting Started in Minutes

ClickHouse (Simplest Setup)

Apache Druid

Apache Pinot

Performance Benchmarks

When to Choose Which

Choose ClickHouse if:

Choose Apache Druid if:

Choose Apache Pinot if:

Ecosystem and Integrations

Migration Tips

Conclusion

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

Can I migrate between these tools?

Are there free versions available?

How do I get started?

Related Posts

InfluxDB vs QuestDB vs TimescaleDB: Best Time-Series Database 2026

OpenReplay vs Highlight vs PostHog: Best Self-Hosted Session Replay 2026

PgBouncer vs ProxySQL vs Odyssey: Best Database Connection Pooling 2026