Running Kubernetes across multiple clusters is increasingly common — for disaster recovery, geographic distribution, or separating workloads by team or environment. But connecting services across cluster boundaries introduces networking complexity. This guide compares three approaches to multi-cluster service connectivity: Cilium ClusterMesh, Istio Multi-Primary, and Linkerd Multicluster.
What Is Multi-Cluster Service Connectivity?
Multi-cluster service connectivity allows services running in one Kubernetes cluster to discover and communicate with services in another cluster, as if they were on the same network. Key requirements include:
- Service discovery — DNS resolution for services across cluster boundaries
- Secure communication — mTLS or encrypted tunnels between clusters
- Load balancing — distributing traffic across cluster endpoints
- Failover — automatic routing when a cluster becomes unavailable
Comparison Overview
| Feature | Cilium ClusterMesh | Istio Multi-Primary | Linkerd Multicluster |
|---|---|---|---|
| GitHub Stars | 24,300+ | 38,200+ | 11,300+ |
| Approach | eBPF-based L3/L4 | Envoy proxy sidecar | Lightweight proxy |
| Cross-cluster DNS | CoreDNS with etcd sync | Istio DNS federation | mirror services |
| mTLS | Automatic (SPIFFE) | Automatic (Istio CA) | Automatic (Linkerd CA) |
| Service Discovery | Global service catalog | Federated control plane | Service mirror controller |
| Failover | Automatic (via eBPF) | Via traffic policies | Manual (failover script) |
| Network Overlay | eBPF (no overlay) | Envoy mesh | Linkerd proxy |
| Complexity | Medium | High | Low |
| Best For | eBPF-native clusters | Full service mesh needs | Simple, lightweight setups |
Cilium ClusterMesh: eBPF-Based Connectivity
Cilium ClusterMesh uses eBPF to create a transparent network overlay between clusters. Services are automatically discovered and load-balanced using Cilium’s global service catalog, synchronized via etcd.
Architecture
Each cluster runs Cilium CNI with ClusterMesh enabled. A shared etcd cluster (or etcd instances connected via tunnel) synchronizes service endpoints. eBPF programs handle transparent service routing at the kernel level — no sidecar proxies required.
Installation
| |
Docker Compose (for etcd ClusterMesh gateway testing)
| |
Key Features
- Global Service Discovery: Services are automatically synchronized across clusters via etcd
- No Sidecar Overhead: eBPF handles routing at kernel level — zero proxy overhead
- Network Policies: Cilium’s L3-L7 network policies work seamlessly across clusters
- Transparent to Applications: No code changes required — standard Kubernetes DNS works
Istio Multi-Primary: Full Mesh Across Clusters
Istio Multi-Primary extends Istio’s service mesh across multiple clusters. Each cluster runs its own Istio control plane, and the meshes are connected via shared trust roots and endpoint synchronization.
Architecture
Each cluster has a full Istio installation (Istiod + Envoy sidecars). Control planes are connected through a shared root certificate. Services communicate via Envoy proxies with automatic mTLS.
Installation
| |
Envoy Sidecar Configuration
| |
Key Features
- Full Service Mesh: Traffic management, observability, and security across clusters
- Automatic mTLS: All cross-cluster traffic is encrypted by default
- Advanced Routing: Canary deployments, traffic splitting, and fault injection across clusters
- High Overhead: Each pod runs an Envoy sidecar (~100MB memory, ~5-10ms latency)
Linkerd Multicluster: Lightweight Approach
Linkerd’s multicluster takes a simpler approach. A service mirror controller watches services in the remote cluster and creates “mirror” services locally. Traffic is routed through a lightweight Linkerd proxy.
Architecture
Each cluster runs Linkerd. The linkerd-multicluster component installs a service mirror controller that watches services exported from the remote cluster and creates mirror services in the local cluster.
Installation
| |
Link Multicluster
| |
Docker Compose (for local testing)
| |
Key Features
- Minimal Overhead: Linkerd’s proxy is written in Rust — lower memory and latency than Envoy
- Simple Setup:
linkerd multicluster linkcommand handles the entire connection - Service Mirroring: Exported services appear as local services with DNS resolution
- No Traffic Management: Linkerd multicluster doesn’t support advanced traffic policies across clusters
Why Use Multi-Cluster Service Connectivity?
Running services across multiple Kubernetes clusters provides several critical advantages that single-cluster deployments cannot match.
Disaster recovery is the most compelling use case. When a cluster goes down — whether from a cloud provider outage, misconfiguration, or hardware failure — multi-cluster connectivity enables automatic failover to a healthy cluster. This is essential for services with strict SLA requirements where even minutes of downtime are unacceptable.
Geographic distribution reduces latency for globally distributed user bases. By running clusters in multiple regions and using multi-cluster connectivity, requests can be served from the nearest cluster while still maintaining a unified service catalog. For compliance and data residency requirements, multi-cluster setups allow you to keep user data in specific regions while maintaining centralized service discovery.
Team and workload isolation is another key driver. Organizations often separate clusters by team, environment, or workload type (e.g., production vs. staging). Multi-cluster connectivity allows these isolated clusters to communicate when needed — for example, a frontend service in one cluster calling a backend API in another — without sharing the same control plane.
For teams evaluating broader multi-cluster orchestration, our Karmada vs Liqo vs Submariner comparison covers federation-level management, while our service mesh observability guide explores monitoring across mesh deployments.
Choosing the Right Multi-Cluster Approach
| Scenario | Recommended Tool |
|---|---|
| eBPF-capable clusters | Cilium ClusterMesh — kernel-level performance |
| Full service mesh with observability | Istio Multi-Primary — complete feature set |
| Minimal overhead, simple setup | Linkerd Multicluster — lightweight proxy |
| Existing Cilium CNI | Cilium ClusterMesh — natural extension |
| Existing Istio deployment | Istio Multi-Primary — reuse existing mesh |
| No service mesh yet | Linkerd Multicluster — easiest to adopt |
FAQ
What is the difference between multi-cluster networking and multi-cluster federation?
Multi-cluster networking (the focus of this article) connects services across independent clusters so they can communicate. Multi-cluster federation (like KubeFed or Karmada) manages resources across clusters from a central control plane — deploying, scaling, and scheduling workloads. Networking focuses on connectivity; federation focuses on orchestration.
Can I mix different CNI plugins across clusters?
For Cilium ClusterMesh, all clusters must run Cilium as the CNI. For Istio and Linkerd, the underlying CNI can differ — Istio and Linkerd operate at the service mesh layer, above the CNI. However, using the same CNI across clusters simplifies troubleshooting and configuration.
How does service discovery work across clusters?
Each tool handles DNS differently. Cilium synchronizes service endpoints via etcd, making services appear in CoreDNS globally. Istio uses federated control planes that share endpoint information. Linkerd creates mirror services — when you export a service from cluster B, a mirror service with the same DNS name appears in cluster A.
What is the performance overhead of each approach?
Cilium ClusterMesh has the lowest overhead because it uses eBPF at the kernel level — no proxy process per pod. Istio adds ~100MB memory and 5-10ms latency per Envoy sidecar. Linkerd’s Rust-based proxy adds ~50MB memory and 2-5ms latency per pod, making it lighter than Istio but heavier than Cilium.
Can I use multi-cluster connectivity for database replication?
Yes, but with caveats. Multi-cluster service connectivity provides network-level connectivity — it does not handle application-level replication. You would use the connectivity to link database replicas (e.g., PostgreSQL streaming replication) across clusters, but the replication logic is handled by the database itself, not the networking layer.
How do I handle secrets across clusters?
Multi-cluster connectivity does not automatically synchronize secrets. You need a separate secrets management solution — see our secrets rotation guide for options like Vault or External Secrets Operator that can distribute secrets across clusters.