Introduction

Every high-performance self-hosted application needs caching. Whether you’re building an API gateway, a database proxy, or a microservice orchestrator, in-memory caches reduce latency by orders of magnitude. Instead of hitting the database or recomputing expensive results on every request, a well-tuned cache serves pre-computed data directly from RAM.

But not all caching libraries are created equal. Java, the dominant language for self-hosted enterprise software, has four major contenders: Caffeine (the modern high-performance choice), Guava Cache (Google’s battle-tested utility library), cache2k (the lean and fast alternative), and OHC (off-heap caching for large datasets).

This guide compares these four libraries across performance, memory management, eviction policies, and production readiness.

Quick Comparison Table

FeatureCaffeineGuava Cachecache2kOHC
GitHub Stars17,70951,4807441,093
Latest Release2026202620252024
Max Throughput (reads/s)~250M~80M~300M~150M
Off-Heap SupportNoNoNoYes
Eviction AlgorithmW-TinyLFULRULRU/ClockN/A (off-heap)
Async LoadingNativeManualNativeNo
Spring Boot IntegrationFirst-classVia SpringManualManual
LicenseApache 2.0Apache 2.0Apache 2.0Apache 2.0

Caffeine: The Modern Gold Standard

Caffeine, created by Ben Manes, is widely considered the best-in-class caching library for the JVM. It introduced the W-TinyLFU eviction algorithm, which combines the frequency-based accuracy of TinyLFU with the recency awareness of LRU, achieving near-optimal hit rates.

1
2
3
4
5
6
7
8
// Caffeine cache with time-based expiration
Cache<String, User> cache = Caffeine.newBuilder()
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .maximumSize(10_000)
    .recordStats()
    .build();

User user = cache.get("user:42", key -> loadFromDatabase(key));

Caffeine’s key strengths include automatic async loading, sophisticated statistics via recordStats(), and seamless Spring Boot integration. For self-hosted services running on the JVM, Caffeine is the default choice for local caching. Its W-TinyLFU algorithm maintains a small sketch of access frequencies and uses a hill-climbing approach to dynamically adapt the cache’s admission policy, consistently outperforming traditional LRU and LFU caches across diverse workloads.

Guava Cache: The Trusted Workhorse

Guava Cache ships as part of Google’s Guava library — which has over 51,000 stars on GitHub and is a dependency of virtually every Java project. Its cache implementation predates Caffeine and uses a simpler segmented LRU eviction strategy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Guava Cache with manual refresh
LoadingCache<String, User> cache = CacheBuilder.newBuilder()
    .maximumSize(10_000)
    .expireAfterAccess(15, TimeUnit.MINUTES)
    .refreshAfterWrite(5, TimeUnit.MINUTES)
    .build(new CacheLoader<String, User>() {
        @Override
        public User load(String key) {
            return loadFromDatabase(key);
        }
    });

Guava Cache’s primary advantage is ubiquity — if your project already depends on Guava (which almost every Java project does), adding caching requires zero additional dependencies. Its refreshAfterWrite mechanism allows stale reads while asynchronously refreshing entries, preventing the “thundering herd” problem where cache expiration causes cascading database loads. However, its LRU eviction is measurably less efficient than Caffeine’s W-TinyLFU under skewed access patterns.

cache2k: The Lightweight Speedster

cache2k is a lesser-known but impressively fast caching library that benchmarks at up to 300 million reads per second on modern hardware. It achieves this through lock-free internal data structures and aggressive memory optimization.

1
2
3
4
5
6
// cache2k with expiry and resilience
Cache<String, User> cache = Cache2kBuilder.of(String.class, User.class)
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .resilienceDuration(30, TimeUnit.SECONDS)
    .loader(key -> loadFromDatabase(key))
    .build();

cache2k’s standout feature is its resilience mode: when the loader fails (e.g., database is down), cache2k continues serving stale but available data for the configured resilience duration. This makes it exceptionally robust for self-hosted production services where availability matters more than freshness during outages. The library also supports JCache (JSR-107) out of the box and integrates well with Spring and Micrometer for observability.

OHC: Off-Heap Caching for Large Datasets

OHC (Off-Heap Cache) takes a fundamentally different approach: instead of storing cache entries on the JVM heap (where garbage collection pauses can become problematic), it allocates memory directly from the operating system via sun.misc.Unsafe. This enables caching datasets of tens or hundreds of gigabytes without affecting GC behavior.

1
2
3
4
5
6
7
8
9
// OHC off-heap cache setup
OHCache<String, User> cache = OHCacheBuilder.<String, User>newBuilder()
    .keySerializer(new StringSerializer())
    .valueSerializer(new UserSerializer())
    .capacity(10L * 1024 * 1024 * 1024) // 10 GB off-heap
    .build();

cache.put("user:42", user);
User cached = cache.get("user:42");

OHC is ideal for self-hosted services that need very large local caches — think API response caches, computed data caches, or session stores running alongside the application. The trade-off is complexity: serialization/deserialization adds overhead, and off-heap memory management requires careful capacity planning. For datasets under 1 GB, a heap-based cache like Caffeine is simpler and faster.

When to Use Which Caching Library

For general-purpose local caching in JVM applications, Caffeine is the clear winner. Its W-TinyLFU algorithm delivers the best hit rates, it integrates seamlessly with Spring Boot and Micrometer, and it has the largest active community.

For projects already using Guava, the built-in cache is adequate and saves a dependency. It’s fine for simple caching needs but falls behind for performance-sensitive workloads.

For maximum throughput in read-heavy services, cache2k’s lock-free architecture delivers benchmark-leading speeds, and its resilience mode provides a safety net that neither Caffeine nor Guava offer natively.

For large cache datasets exceeding 2 GB, OHC’s off-heap approach eliminates GC pressure. It’s particularly well-suited for API gateway response caching and session stores where data volume would cause problematic GC pauses with heap-based caches.

Why Self-Host Your Caching Layer?

Running your own caching infrastructure gives you complete control over eviction policies, memory budgets, and data lifecycle — something cloud-managed caches abstract away. With a local in-memory cache, you eliminate network latency entirely: a local cache lookup takes microseconds versus milliseconds for Redis (even on localhost). This 100x improvement matters for high-throughput self-hosted services processing millions of requests per hour.

For a broader perspective on distributed caching across your infrastructure, see our guide to self-hosted Redis alternatives. If you need database-level connection pooling to complement your caching strategy, check our database connection pooling comparison. For benchmarking your database performance before and after adding caching, see our database benchmarking tools guide.

Deployment Patterns for Self-Hosted Services

A common pattern in self-hosted Java services uses Caffeine for the L1 (local) cache and Redis or Valkey for the L2 (distributed) cache. Here’s a Spring Boot configuration that combines both:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# docker-compose.yml for a Spring Boot service with local + distributed caching
version: "3.8"
services:
  app:
    image: your-app:latest
    environment:
      - SPRING_CACHE_CAFFEINE_SPEC=maximumSize=10000,expireAfterWrite=600s
      - SPRING_REDIS_HOST=valkey
      - SPRING_REDIS_PORT=6379
    ports:
      - "8080:8080"
    depends_on:
      - valkey

  valkey:
    image: valkey/valkey:7.2-alpine
    ports:
      - "6379:6379"
    volumes:
      - valkey_data:/data
    command: valkey-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru

volumes:
  valkey_data:

This setup gives you microsecond response times from Caffeine’s local cache while maintaining consistency across multiple application instances through the distributed Valkey layer.

Performance Benchmarks and Scaling Considerations

Caffeine’s W-TinyLFU algorithm excels under Zipfian access patterns (where a small set of keys receives the majority of requests) — which is exactly how most real-world caches behave. In synthetic benchmarks, Caffeine achieves hit rates within 1-2% of the theoretical optimum, while LRU-based caches (Guava) trail by 5-15% depending on workload.

cache2k shines under uniform access patterns where lock contention becomes the bottleneck. Its read-path avoids atomic operations entirely in the hot path, achieving throughput that saturates memory bandwidth rather than CPU. For services that need both L1 and L2 caching, cache2k’s native JCache support simplifies integration with distributed cache providers like Hazelcast or Infinispan.

OHC’s throughput varies significantly with entry size: small entries (<1KB) see ~150M ops/s, while larger entries (>100KB) drop to ~10M ops/s as serialization dominates. Capacity planning is critical — OHC pre-allocates its entire memory budget at startup, so over-provisioning wastes resources while under-provisioning causes evictions.

FAQ

Which caching library should I use for a new Spring Boot project?

Use Caffeine. It’s the officially recommended cache provider in Spring Boot’s documentation, offers the W-TinyLFU eviction algorithm (best hit rates), and integrates with Spring’s @Cacheable annotation out of the box. Add spring-boot-starter-cache and caffeine to your dependencies, and you’re ready to go.

Can I use multiple caching libraries in the same application?

Yes, and this is common in practice. A typical pattern uses Caffeine for local L1 caching and a distributed cache (Redis, Hazelcast) for L2. Spring’s CacheManager abstraction supports multiple cache managers, so you can configure each cache with the best library for its specific access pattern.

How does off-heap caching (OHC) affect garbage collection?

Off-heap caching eliminates GC pressure for cached data because entries are stored outside the JVM heap. However, off-heap memory isn’t managed by the JVM, so you need to monitor native memory usage separately. OHC’s capacity settings are hard limits — the cache will never exceed the configured size, which makes capacity planning predictable.

Is Guava Cache being deprecated in favor of Caffeine?

Guava Cache is not deprecated, but Ben Manes (Caffeine’s creator) was also a major contributor to Guava Cache. The official recommendation from both projects is to use Caffeine for new development and migrate from Guava Cache if you need better eviction performance. Guava Cache is still maintained and receives bug fixes.

How do I monitor cache hit rates in production?

All four libraries support hit rate tracking. Caffeine exposes statistics via Cache.stats() and Micrometer integration. cache2k provides JMX beans and Micrometer metrics. Guava Cache uses Cache.stats(). OHC tracks hits/misses via its OHCache.stats() method. Export these metrics to Prometheus for monitoring and alerting on declining hit rates.


💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com