← Back to posts
comparison guide self-hosted · · 16 min read

etcd vs Consul vs ZooKeeper: Best Self-Hosted Service Discovery 2026

Complete comparison of etcd, Consul, and ZooKeeper for self-hosted service discovery, distributed coordination, and configuration management in 2026.

OS
Editorial Team

Every growing infrastructure eventually hits the same wall: services multiply, configurations scatter across machines, and keeping track of what’s running where becomes a nightmare. Hardcoded endpoints in config files don’t scale. That’s where distributed coordination and service discovery come in.

Three projects dominate this space: etcd, Consul, and Apache ZooKeeper. All three solve fundamentally the same problem — maintaining a consistent, shared state across multiple nodes — but they differ significantly in architecture, feature set, and operational complexity.

If you’re running a self-hosted infrastructure and need reliable service discovery, configuration management, or distributed locks, this guide will help you choose the right tool and deploy it today.

Why Self-Host Your Service Discovery Layer

Service discovery is the backbone of any modern infrastructure. Instead of hardcoding IP addresses and ports, services register themselves with a central directory and query it at runtime. But why self-host instead of using managed cloud services?

Complete data sovereignty. Service registries contain your entire infrastructure topology — every service name, every endpoint, every health check result. For regulated industries and privacy-conscious organizations, keeping this metadata on your own servers isn’t optional.

No vendor lock-in. Cloud-native service discovery tools tie you to a specific provider’s ecosystem. etcd, Consul, and ZooKeeper all run identically on bare metal, VMs, or any cloud. Move your entire stack between providers without rewriting a single config.

Cost at scale. Managed service discovery pricing scales with the number of registered services and API calls. A medium-sized infrastructure with hundreds of microservices can easily spend thousands per month on a managed solution. Self-hosted, the only cost is the hardware you already own.

Integration with existikubernetessted tools. Kubernetes uses etcd natively. Traefik and Caddy integrate with Consul. Kafka depends on ZooKeeper (historically). Running the same coordination layer across your stack simplifies operations and reduces the learning curve.

Works offline and in air-gapped environments. Self-hosted service discovery doesn’t need an internet connection. For edge computing, industrial IoT, or military-grade air-gapped networks, it’s the only option.

etcd: The Kubernetes Native Choice

etcd is a distributed, consistent key-value store designed specifically for distributed systems. It’s best known as the backing store for Kubernetes, but it stands perfectly well on its own.

Architecture

etcd uses the Raft consensus algorithm for strong consistency guarantees. Every node in an etcd cluster maintains an identical copy of the data. Writes go through the leader node, which replicates the log entry to followers. A write is committed only when a quorum (majority) of nodes acknowledge it.

The Raft protocol gives etcd several important properties:

  • Linearizable reads — every read returns the most recent committed value
  • Partition tolerance — the cluster remains available as long as a majority of nodes can communicate
  • Automatic leader election — if the leader fails, a new one is elected within seconds

Unlike Consul and ZooKeeper, etcd deliberately limits its scope. It provides a key-value API, watch mechanism, and lease system. Everything else — service discovery, leader election, distributed locks — is built on top of these primitives using conventions.

Self-Hosted Deployment

The simplestdockero run etcd is via Docker. Here’s a single-node setup for development:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
version: "3.8"

services:
  etcd:
    image: bitnami/etcd:latest
    container_name: etcd-single
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_ADVERTISE_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
    ports:
      - "2379:2379"
      - "2380:2380"
    volumes:
      - etcd-data:/bitnami/etcd
    restart: unless-stopped

volumes:
  etcd-data:

For production, a three-node cluster is the minimum recommended configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Node 1 - etcd-1 (192.168.1.10)
services:
  etcd:
    image: bitnami/etcd:latest
    environment:
      - ETCD_NAME=etcd-1
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.1.10:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://192.168.1.10:2379
      - ETCD_INITIAL_CLUSTER=etcd-1=http://192.168.1.10:2380,etcd-2=http://192.168.1.11:2380,etcd-3=http://192.168.1.12:2380
      - ETCD_INITIAL_CLUSTER_STATE=new
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-prod
    ports:
      - "2379:2379"
      - "2380:2380"
    volumes:
      - /opt/etcd/data:/bitnami/etcd
    restart: unless-stopped

Adjust the IPs for nodes 2 and 3 accordingly. The key parameters are:

ParameterPurpose
ETCD_NAMEUnique node identifier in the cluster
ETCD_INITIAL_CLUSTERComplete member list for bootstrapping
ETCD_INITIAL_CLUSTER_STATEnew for fresh cluster, existing for adding nodes
ETCD_INITIAL_CLUSTER_TOKENShared token that prevents accidental cluster merges

Using etcd for Service Discovery

etcd doesn’t have a built-in “service registry” concept. Instead, you use its key-value store with conventions and the watch mechanism:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Register a service with a TTL lease
# First, create a 60-second lease
LEASE_ID=$(etcdctl lease grant 60 --endpoints=http://localhost:2379 -w json | jq -r '.ID')

# Register the service, tied to the lease
etcdctl put --lease=$LEASE_ID /services/web-app/instance-1 '{"host":"10.0.1.5","port":8080,"version":"2.3.1"}' \
  --endpoints=http://localhost:2379

# Keep the lease alive (run in a loop or cron)
etcdctl lease keep-alive $LEASE_ID --endpoints=http://localhost:2379

# Discover all instances of a service
etcdctl get /services/web-app/ --prefix --endpoints=http://localhost:2379

# Watch for changes (blocks until a key changes)
etcdctl watch /services/web-app/ --prefix --endpoints=http://localhost:2379

The pattern is simple: services write their own endpoint under a hierarchical key, attach it to a lease with a TTL, and periodically renew the lease. If a service crashes, the lease expires and the key is automatically deleted. Watching clients get notified immediately.

When etcd Shines

etcd excels when you need strong consistency, simple API semantics, and tight Kubernetes integration. Its watch mechanism is exceptionally reliable — clients get real-time notifications with guaranteed ordering and no missed events. The gRPC-based API is fast and supports multiplexed streams over a single connection.

The tradeoff is operational discipline: etcd requires careful sizing of its storage backend (it uses bbolt, an embedded B-tree), and it doesn’t tolerate high write volumes well. The recommended maximum is about 10,000 keys and 1.5 MB of total data. It’s designed for coordination, not for storing application data.

Consul: The Feature-Rich Contender

Consul, developed by HashiCorp, is a full-featured service mesh and service discovery platform. Where etcd provides primitives, Consul provides a complete solution out of the box.

Architecture

Consul uses the Raft consensus algorithm like etcd, but adds a gossip protocol (Serf) for intra-datacenter communication. This dual-protocol design gives Consul unique capabilities:

  • Multi-datacenter awareness. Consul natively federates across data centers with WAN gossip between them.
  • Built-in health checking. Consul actively probes registered services and automatically deregisters unhealthy instances.
  • DNS interface. In addition to HTTP API, Consul answers DNS queries — any application that can resolve a hostname can discover services without SDK changes.
  • Service mesh (Connect). Consul provides mTLS, traffic management, and observability between services through its Connect feature.
  • Key-value store. Like etcd, Consul includes a KV store for configuration management.

The gossip protocol means Consul can detect node failures faster than pure Raft-based systems. Serf sends lightweight heartbeat messages between all nodes, detecting failures in seconds rather than relying on Raft election timeouts.

Self-Hosted Deployment

Consul’s Docker image includes both server and agent modes. Here’s a production-ready single-datacenter deployment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
version: "3.8"

services:
  consul:
    image: hashicorp/consul:latest
    container_name: consul-server-1
    command: "agent -server -ui -bootstrap-expect=3 -client=0.0.0.0"
    environment:
      - CONSUL_BIND_INTERFACE=eth0
    ports:
      - "8500:8500"   # HTTP API + UI
      - "8600:8600/udp" # DNS
      - "8301:8301"   # LAN gossip
      - "8302:8302"   # WAN gossip
      - "8300:8300"   # Server RPC
    volumes:
      - consul-data:/consul/data
      - consul-config:/consul/config
    restart: unless-stopped
    networks:
      consul-net:
        ipv4_address: 172.20.0.2

  # Client agent on every application host
  consul-client:
    image: hashicorp/consul:latest
    command: "agent -retry-join=consul-server-1 -client=0.0.0.0"
    network_mode: "host"
    volumes:
      - /etc/consul.d:/consul/config
    restart: unless-stopped

volumes:
  consul-data:
  consul-config:

networks:
  consul-net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

For a multi-node server cluster, add more server containers and configure them to join via the retry-join parameter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# /etc/consul.d/server.hcl on each server node
datacenter = "dc1"
data_dir   = "/consul/data"
server     = true
bootstrap_expect = 3

retry_join = [
  "192.168.1.10",
  "192.168.1.11",
  "192.168.1.12"
]

encrypt = "pUqJrVyVRj5jsiYEkM/tFQYfWyJIv4s3XkvDwy7Cu5s="

acl = {
  enabled        = true
  default_policy = "deny"
  tokens = {
    initial_management = "root-token-change-me"
  }
}

Using Consul for Service Discovery

Consul’s service discovery is more ergonomic than etcd’s. You define services declaratively and Consul handles the rest:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// /etc/consul.d/web-app.json
{
  "service": {
    "name": "web-app",
    "port": 8080,
    "tags": ["v2.3", "production"],
    "meta": {
      "version": "2.3.1",
      "environment": "production"
    },
    "check": {
      "http": "http://localhost:8080/health",
      "interval": "10s",
      "timeout": "5s"
    }
  }
}

Discover services via multiple interfaces:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# HTTP API - get all healthy instances
curl http://localhost:8500/v1/catalog/service/web-app

# DNS interface - any DNS resolver can query this
dig @localhost -p 8600 web-app.service.consul

# DNS with tags for versioned lookups
dig @localhost -p 8600 web-app.production.service.consul

# Health endpoint - only returns passing checks
curl http://localhost:8500/v1/health/service/web-app?passing

# KV store for configuration
consul kv put config/web-app/database/url "postgres://db:5432/myapp"
consul kv get config/web-app/database/url
consul kv get -recurse config/

The DNS interface is Consul’s killer feature for self-hosted infrastructure. Legacy applications, database drivers, and third-party services that can’t integrate with an HTTP API still benefit from service discovery through standard DNS resolution.

When Consul Shines

Consul is the right choice when you need a complete service discovery and mesh platform with minimal custom development. Its built-in health checks, DNS interface, multi-datacenter support, and service mesh capabilities cover requirements that would take months to build on top of etcd.

The tradeoff is complexity. Consul has more moving parts (Raft + Serf + health check scheduler + DNS server + Connect proxies), which means more configuration surface area and more things to monitor. The resource footprint is also larger — a Consul server node typically uses 2-3x more memory than an etcd node under similar load.

Apache ZooKeeper: The Battle-Tested Veteran

ZooKeeper is the oldest of the three, originally developed at Yahoo! and now an Apache project. It pioneered the distributed coordination pattern and remains the backbone of major distributed systems including Apache Kafka, Apache HBase, and Apache Solr.

Architecture

ZooKeeper uses a custom consensus protocol called ZAB (ZooKeeper Atomic Broadcast), which predates Raft but shares similar principles. It maintains a hierarchical namespace of znodes (like a filesystem) that all servers in the ensemble replicate.

Key architectural features:

  • Hierarchical data model. Znodes form a tree structure (/services/web-app/instance-1) with support for ephemeral nodes (deleted when the client disconnects) and sequential nodes (auto-incrementing names).
  • Watchers. Clients can set one-time watches on znodes to receive notifications when data changes.
  • High write throughput. ZAB is optimized for write-heavy workloads and can sustain higher throughput than Raft-based systems under certain conditions.
  • Java ecosystem. Being a Java project, ZooKeeper integrates naturally with the broader Apache ecosystem.

ZooKeeper’s znode types give it unique expressiveness:

TypeBehaviorUse Case
PersistentSurvives client disconnectConfiguration data
EphemeralDeleted on client disconnectService registration
Persistent SequentialAuto-appends sequence numberDistributed queues
Ephemeral SequentialBoth ephemeral and sequentialLeader election

Self-Hosted Deployment

ZooKeeper’s Docker deployment requires more configuration than etcd or Consul, but the result is equally robust:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
version: "3.8"

services:
  zk1:
    image: bitnami/zookeeper:latest
    container_name: zk1
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    environment:
      - ZOO_SERVER_ID=1
      - ZOO_SERVERS=zk1:2888:3888,zk2:2889:3889,zk3:2890:3890
      - ALLOW_ANONYMOUS_LOGIN=yes
    volumes:
      - zk1-data:/bitnami/zookeeper
    restart: unless-stopped
    networks:
      zk-net:
        ipv4_address: 172.21.0.2

  zk2:
    image: bitnami/zookeeper:latest
    container_name: zk2
    ports:
      - "2182:2181"
      - "2889:2888"
      - "3889:3888"
    environment:
      - ZOO_SERVER_ID=2
      - ZOO_SERVERS=zk1:2888:3888,zk2:2889:3889,zk3:2890:3890
      - ALLOW_ANONYMOUS_LOGIN=yes
    volumes:
      - zk2-data:/bitnami/zookeeper
    restart: unless-stopped
    networks:
      zk-net:
        ipv4_address: 172.21.0.3

  zk3:
    image: bitnami/zookeeper:latest
    container_name: zk3
    ports:
      - "2183:2181"
      - "2890:2888"
      - "3890:3888"
    environment:
      - ZOO_SERVER_ID=3
      - ZOO_SERVERS=zk1:2888:3888,zk2:2889:3889,zk3:2890:3890
      - ALLOW_ANONYMOUS_LOGIN=yes
    volumes:
      - zk3-data:/bitnami/zookeeper
    restart: unless-stopped
    networks:
      zk-net:
        ipv4_address: 172.21.0.4

volumes:
  zk1-data:
  zk2-data:
  zk3-data:

networks:
  zk-net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.21.0.0/16

For bare metal deployment without Docker, the configuration uses zoo.cfg:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# /opt/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
admin.enableServer=true
admin.serverPort=8080

# Ensemble members
server.1=192.168.1.10:2888:3888
server.2=192.168.1.11:2888:3888
server.3=192.168.1.12:2888:3888

# Write the server ID to myid file on each node
# On server 1: echo "1" > /var/lib/zookeeper/myid
# On server 2: echo "2" > /var/lib/zookeeper/myid
# On server 3: echo "3" > /var/lib/zookeeper/myid

Using ZooKeeper for Service Discovery

The ZooKeeper client API is more verbose than etcd or Consul, but the hierarchical model maps naturally to service discovery patterns:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Connect via the CLI
zkCli.sh -server localhost:2181

# Create a persistent parent node
create /services ""

# Create ephemeral nodes for service instances
create -e /services/web-app/instance-1 '{"host":"10.0.1.5","port":8080}'

# Watch for changes
stat /services/web-app watch

# List all instances
ls /services/web-app

# Get data from a specific instance
get /services/web-app/instance-1

# Create a sequential node (useful for leader election)
create -e -s /services/web-app/leader ""

# Using Apache Curator (Java) for higher-level abstractions:
# Leader election:
LeaderSelector selector = new LeaderSelector(client, "/services/web-app/leader", listener);
selector.start();

# Service discovery:
ServiceProvider<ServiceInstance> provider = 
    serviceDiscovery.serviceProviderBuilder()
        .serviceName("web-app")
        .build();
provider.start();
Collection<ServiceInstance> instances = provider.getAllInstances();

When ZooKeeper Shines

ZooKeeper is the best choice when you’re already running Apache ecosystem tools that depend on it (Kafka, HBase, Solr, Dubbo), or when you need high write throughput with a hierarchical data model. Its battle-tested track record — running in production at some of the largest tech companies for over 15 years — speaks to its reliability.

The main drawbacks are operational: ZooKeeper requires Java, has a steeper learning curve, and its one-time watch semantics (you must re-register a watch after each notification) add complexity compared to etcd’s continuous watches. The community has also slowed in recent years as newer alternatives have gained traction.

Head-to-Head Comparison

FeatureetcdConsulZooKeeper
Consensus ProtocolRaftRaft + Serf gossipZAB
APIgRPC + HTTP/JSONHTTP/JSON + DNS + gRPCCustom binary protocol
Data ModelFlat key-valueKey-value + service catalogHierarchical znodes
Health ChecksNo (build on leases)Built-in (HTTP, TCP, script, TTL)No (rely on ephemeral nodes)
Service DiscoveryConvention-basedFirst-class featureConvention-based
Multi-DatacenterNoNativeNo
Service MeshNo (use Istio separately)Built-in (Connect)No
Watch MechanismContinuousContinuousOne-time (must re-register)
KV Store Limits~10K keys, 1.5 MB totalNo strict limits~50 MB max znode size
Write Throughput~10K ops/sec~5K ops/sec~30K ops/sec
Resource UsageLow (~50 MB RAM per node)Medium (~150 MB RAM per node)High (~200 MB RAM per node + JVM)
LanguageGoGoJava
Binary Size~80 MB~120 MB~35 MB + JVM (~100 MB)
UI/Dashboardetcd-manager (third-party)Built-in web UINo (use third-party tools)
Kubernetes IntegrationNative (backing store)Via consul-k8sNot recommended
TLS/mTLSNativeNativeVia external proxy
ACL SystemRBAC (v3.5+)Built-in token-basedSASL/Digest
LicenseApache 2.0BUSL 1.1 (source-available)Apache 2.0

Important Licensing Note

As of 2026, Consul uses the Business Source License (BUSL 1.1) for releases after September 2023. This is a source-available license — not open source in the OSI definition. For most self-hosted internal use cases, BUSL is permissive enough. However, if you need to offer Consul as a managed service to customers, or if your organization has strict open-source-only policies, etcd or ZooKeeper (both Apache 2.0) are the safer choices.

Decision Framework

Choose etcd if:

  • You’re running Kubernetes and want a unified coordination layer
  • You need strong consistency with a minimal, well-defined API
  • Your data fits within the 10K key limit (configuration, not application data)
  • You prefer Apache 2.0 licensing
  • Your team knows Go and values simplicity

Choose Consul if:

  • You need a complete service discovery platform with health checks and DNS
  • You operate across multiple data centers
  • You want built-in service mesh capabilities (mTLS, traffic splitting)
  • You have legacy applications that benefit from DNS-based discovery
  • You value the built-in web UI for operational visibility

Choose ZooKeeper if:

  • You’re already running Kafka, HBase, or Solr
  • You need high write throughput with hierarchical data organization
  • You’re invested in the Java/Apache ecosystem and use Curator
  • You need the proven reliability of a 15+ year old system
  • Apache 2.0 licensing is a hard requirement

Monitoring Your Cluster

Whichever tool you choose, monitoring is essential. Here are the key metrics to track:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# etcd monitoring (Prometheus endpoint built-in at :2379/metrics)
# Key metrics to alert on:
# - etcd_server_has_leader (should always be 1)
# - etcd_server_leader_changes_seen_total (alert on spikes)
# - etcd_disk_wal_fsync_duration_seconds (alert on > 100ms)
# - etcd_network_peer_round_trip_time_seconds (alert on > 500ms)
# - etcd_mvcc_db_total_size_in_bytes (alert at 80% of quota)

# Consul monitoring
# Built-in Prometheus metrics at :8500/v1/agent/metrics?format=prometheus
# Key metrics:
# - consul_raft_state (should be leader or follower, never candidate)
# - consul_raft_commitTime (alert on > 100ms p99)
# - consul_catalog_service_node_healthy (alert on drops)
# - consul_serf_lan_members (alert on count changes)

# ZooKeeper monitoring
# Four-letter words commands for quick checks:
echo ruok | nc localhost 2181  # Should respond "imok"
echo mntr | nc localhost 2181  # Returns metrics
echo conf | nc localhost 2181  # Shows configuration
# Key mntr metrics:
# - zk_server_state (leader/follower)
# - zk_num_alive_connections
# - zk_avg_latency / zk_max_latency
# - zk_outstanding_requests (alert on > 1000)

For all three, deploy a Grafana dashboard with pre-built panels. The Prometheus community maintains official dashboards for etcd and Consul, and the ZooKeeper exporter (prometheus-jmx-exporter) provides JVM-level metrics.

Final Thoughts

The self-hosted service discovery landscape has matured significantly. In 2026, you have three production-grade options, each with distinct strengths:

etcd wins on simplicity and Kubernetes integration. Its minimal API and small footprint make it easy to operate.

Consul wins on feature completeness. Its service mesh, health checks, and DNS interface provide a turnkey solution that reduces development overhead.

ZooKeeper wins on proven reliability and throughput. If your stack already includes Apache ecosystem tools, it’s often the pragmatic choice.

For most new self-hosted projects, etcd is the recommended starting point. It’s simple, well-documented, and the conventions for service discovery on top of its KV store are well-established. If you need the additional features — especially health checks and multi-datacenter support — Consul is worth the added complexity.

Whichever you choose, the key is consistency: run the same coordination layer across your entire infrastructure, automate the deployment with infrastructure-as-code, and invest in monitoring before issues arise.

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

The best choice depends on your specific requirements:

  • For beginners: Start with the simplest option that covers your core use case
  • For production: Choose the solution with the most active community and documentation
  • For teams: Look for collaboration features and user management
  • For privacy: Prefer fully open-source, self-hosted options with no telemetry

Refer to the comparison table above for detailed feature breakdowns.

Can I migrate between these tools?

Most tools support data import/export. Always:

  1. Backup your current data
  2. Test the migration on a staging environment
  3. Check official migration guides in the documentation

Are there free versions available?

All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.

How do I get started?

  1. Review the comparison table to identify your requirements
  2. Visit the official documentation (links provided above)
  3. Start with a Docker Compose setup for easy testing
  4. Join the community forums for troubleshooting
Advertise here