Self-Hosted Cold Storage Solutions: MinIO ILM vs Ceph Lifecycle vs SeaweedFS Tiered Storage

When organizations outgrow their primary storage capacity, the cost of keeping all data on high-performance disks becomes unsustainable. Cold storage — storing infrequently accessed data on cheaper media while maintaining on-demand accessibility — solves this problem. Instead of paying AWS Glacier or Azure Archive Storage for long-term retention, you can build a self-hosted cold storage tier using open-source object stores with built-in lifecycle management.

This guide compares three leading self-hosted solutions: MinIO with its ILM (Information Lifecycle Management) engine, Ceph with tiered storage and lifecycle policies, and SeaweedFS with its hot/cold storage tiering architecture. Each handles data lifecycle differently, supports different storage backends, and targets different scale profiles.

Understanding Cold Storage Architecture

Cold storage refers to storing data that is accessed infrequently on lower-cost media while keeping it retrievable on demand. Unlike backup systems (which create point-in-time copies), cold storage is the primary location for aging data — just on cheaper hardware.

A proper cold storage architecture includes:

Lifecycle policies — rules that automatically transition data from hot to cold tiers based on age, access patterns, or metadata
Tiered storage — fast NVMe/SSD for recent data, slower HDD for warm data, and high-capacity drives or tape for cold data
S3 compatibility — so existing applications can access cold data without modification
Data integrity — checksums and erasure coding to protect data over its extended lifetime
Cost efficiency — the primary motivation for cold storage is reducing per-TB storage costs

MinIO ILM (Information Lifecycle Management)

MinIO is the most widely deployed self-hosted S3-compatible object store, with over 60,900 GitHub stars. Its ILM feature provides policy-driven data lifecycle management including automatic tier transitions, expiration rules, and non-current version management.

Key Features

Transition rules — automatically move objects to a designated cold tier (remote S3 target) after N days
Expiration rules — permanently delete objects after a retention period
Non-current version management — apply separate lifecycle rules to old versions of versioned objects
S3 API native — lifecycle rules use standard S3 PutBucketLifecycleConfiguration API calls
Multi-node distributed mode — erasure-coded storage across multiple nodes for durability

MinIO’s ILM works by defining a “remote tier” (another S3-compatible target, which could be a separate MinIO cluster with slower drives) and creating transition rules that move objects between tiers based on age or access patterns.

Docker Compose Deployment

MinIO’s official distributed deployment uses four nodes with erasure coding:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
version: '3.7'

x-minio-common: &minio-common
  image: quay.io/minio/minio:latest
  command: server --console-address ":9001" http://minio{1...4}/data{1...2}
  expose:
    - "9000"
    - "9001"
  healthcheck:
    test: ["CMD", "mc", "ready", "local"]
    interval: 5s
    timeout: 5s
    retries: 5
  environment:
    MINIO_ROOT_USER: minioadmin
    MINIO_ROOT_PASSWORD: minioadmin-secret

services:
  minio1:
    <<: *minio-common
    hostname: minio1
    volumes:
      - data1-1:/data1
      - data1-2:/data2
  minio2:
    <<: *minio-common
    hostname: minio2
    volumes:
      - data2-1:/data1
      - data2-2:/data2
  minio3:
    <<: *minio-common
    hostname: minio3
    volumes:
      - data3-1:/data1
      - data3-2:/data2
  minio4:
    <<: *minio-common
    hostname: minio4
    volumes:
      - data4-1:/data1
      - data4-2:/data2

volumes:
  data1-1: {}
  data1-2: {}
  data2-1: {}
  data2-2: {}
  data3-1: {}
  data3-2: {}
  data4-1: {}
  data4-2: {}

Configure lifecycle rules using the mc CLI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Add a cold storage tier (separate MinIO cluster with HDDs)
mc admin tier add minio/ cold --s3 \
  --endpoint https://cold-storage.internal:9000 \
  --bucket cold-objects \
  --access-key cold-admin \
  --secret-key cold-secret

# Create a transition rule: move to cold after 30 days
mc ilm add minio/mybucket \
  --transition-days 30 \
  --transition-tier cold

# Set expiration: delete objects after 365 days
mc ilm add minio/mybucket \
  --expiry-days 365

Ceph Object Gateway (RGW) Lifecycle Management

Ceph is a distributed storage platform providing object, block, and file storage from a single cluster. With 16,500+ GitHub stars, it’s the most mature open-source distributed storage system. Ceph’s Object Gateway (RGW) supports S3 lifecycle policies natively.

Key Features

Unified storage — object, block (RBD), and file (CephFS) from one cluster
Bucket lifecycle policies — S3-compatible lifecycle rules for object expiration and transitions
Storage classes — native support for different storage classes within the same cluster
Erasure coding — built-in data protection without requiring separate replica pools
Multi-site replication — active-active replication across geographic locations
CRUSH algorithm — intelligent data placement across heterogeneous storage devices

Ceph’s approach to tiering differs from MinIO’s. Instead of transitioning to a separate cluster, Ceph can use storage classes within the same cluster — placing cold data on cheaper drives using CRUSH rules that target specific OSD types.

Ceph Lifecycle Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
<!-- S3-compatible bucket lifecycle XML configuration -->
<LifecycleConfiguration>
  <Rule>
    <ID>transition-to-cold</ID>
    <Filter>
      <Prefix>logs/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>90</Days>
      <StorageClass>COLD</StorageClass>
    </Transition>
    <Expiration>
      <Days>730</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

Apply the policy using aws-cli or s3cmd:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Apply lifecycle policy to a bucket
aws --endpoint-url https://rgw.ceph.internal:443 \
  s3api put-bucket-lifecycle-configuration \
  --bucket my-data \
  --lifecycle-configuration file://lifecycle.json

# Verify the policy
aws --endpoint-url https://rgw.ceph.internal:443 \
  s3api get-bucket-lifecycle-configuration \
  --bucket my-data

Ceph Storage Class Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# In ceph.conf, define storage tiers by drive type
[global]
osd pool default crush rule = 1  # SSD-backed rule

# Create a cold storage pool using HDD OSDs
ceph osd pool create cold-storage 64 64
ceph osd pool set cold-storage crush_rule cold-hdd

# Configure RGW to use storage classes
rgw_swift_versioning_enabled = true
rgw_lc_debug_interval = 30  # Lifecycle check interval (seconds)

SeaweedFS Tiered Storage

SeaweedFS (32,100+ GitHub stars) is a distributed storage system designed for billions of files with O(1) disk seek time. Its unique architecture separates volume servers (data storage) from filer servers (metadata), enabling efficient hot/cold tiering through volume movement.

Key Features

O(1) disk access — unique volume-based architecture avoids directory tree traversal
Automatic tiering — volume server can replicate data across hot and cold nodes
S3 + Filer + Hadoop compatible — multiple access protocols from one cluster
Filer-to-S3 bridge — automatically replicate filer data to S3-compatible cold storage
Iceberg table support — native integration with Apache Iceberg for data lake use cases
Small file optimization — stores many small files efficiently (unlike HDFS)

SeaweedFS achieves tiering through its volume replication and filer remote storage features. Volumes can be replicated across nodes with different storage characteristics, and the filer can transparently offload old data to S3-compatible backends.

SeaweedFS Docker Compose

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
version: '3.8'

services:
  weed-master:
    image: chrislusf/seaweedfs:latest
    command: master -ip=master -ip.bind=0.0.0.0 -defaultReplication=000
    ports:
      - "9333:9333"
      - "19333:19333"
    volumes:
      - master-data:/data

  weed-volume-hot:
    image: chrislusf/seaweedfs:latest
    command: volume -mserver=master:9333 -ip=volume-hot -port=8080 -dir=/data -max=100
    ports:
      - "8080:8080"
      - "18080:18080"
    volumes:
      - hot-data:/data  # SSD-backed
    deploy:
      resources:
        limits:
          memory: 4G

  weed-volume-cold:
    image: chrislusf/seaweedfs:latest
    command: volume -mserver=master:9333 -ip=volume-cold -port=8081 -dir=/data -max=200
    ports:
      - "8081:8080"
      - "18081:18080"
    volumes:
      - cold-data:/data  # HDD-backed
    deploy:
      resources:
        limits:
          memory: 2G

  weed-filer:
    image: chrislusf/seaweedfs:latest
    command: filer -master=master:9333
    ports:
      - "8888:8888"
      - "8333:8333"
    volumes:
      - filer-data:/data
    depends_on:
      - weed-master

volumes:
  master-data: {}
  hot-data: {}
  cold-data: {}
  filer-data: {}

Configure cold storage tiering via the filer configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# filer.toml - configure remote S3 storage for cold data offload
[storage]
  [storage.s3]
    enabled = true
    endpoint = "http://cold-storage:9000"
    region = "us-east-1"
    bucket = "seaweedfs-cold"
    access_key = "admin"
    secret_key = "password"

# Set automatic offload rules
[leveldb2]
  dir = "/data"
  ttl_seconds = 0  # no expiration by default

# Volume tiering: replicate cold volumes to HDD nodes
[volume]
  data_center = "dc1"
  replication = "001"  # 1 copy on a different rack

Comparison: Cold Storage Feature Matrix

Feature	MinIO ILM	Ceph RGW Lifecycle	SeaweedFS Tiering
S3 API Compatibility	Full (native)	Full (via RGW)	Full (via S3 gateway)
Lifecycle Transitions	To remote S3 tier	Storage class within cluster	Volume movement + remote
Erasure Coding	Yes (built-in)	Yes (CRUSH-based)	Yes (EC volume mode)
Multi-Site Replication	Yes (site replication)	Yes (active-active)	Via volume replication
Small File Handling	Good	Moderate	Excellent (O(1) seek)
Deployment Complexity	Low	High	Moderate
Storage Classes	Via remote tiers	Native (via CRUSH)	Via volume placement
Object Expiration	Yes (ILM rules)	Yes (lifecycle XML)	Via filer TTL
Non-current Version Mgmt	Yes	Partial	Via filer versioning
GitHub Stars	60,900+	16,500+	32,100+
License	AGPLv3	LGPL / GPL	Apache 2.0
Best For	S3-native workloads	Unified storage needs	Billions of small files

Choosing the Right Cold Storage Solution

Choose MinIO ILM when:

You need full S3 API compatibility with standard lifecycle rules
Your team already uses mc CLI and S3 SDKs
You want simple deployment with clear hot/cold cluster separation
You’re transitioning data to an external cold storage target (another MinIO cluster, Wasabi, Backblaze B2)

Choose Ceph when:

You need unified object, block, and file storage from one platform
You already run Ceph for other storage workloads (RBD, CephFS)
You need multi-site active-active replication with conflict resolution
You want storage-class-based tiering within a single cluster using CRUSH rules

Choose SeaweedFS when:

You store billions of small files (images, documents, logs)
You need O(1) disk access time for large directory structures
You want Apache Iceberg table integration for analytics workloads
You need lightweight deployment with minimal operational overhead

Why Self-Host Your Cold Storage?

Cloud archive storage pricing appears attractive at first glance, but costs accumulate quickly as data volumes grow. AWS Glacier charges $0.004/GB/month for storage, but retrieval costs range from $0.01 to $0.06 per GB depending on speed, and API requests add $0.05 per 1,000 operations. At 100 TB of archive data, you are looking at $400/month in storage alone, plus retrieval costs that can easily exceed $1,000 per restore operation.

Self-hosted cold storage changes this economics entirely. With commodity 18 TB hard drives at approximately $300 each, your cost per TB drops to roughly $17 for raw storage. Even accounting for power, cooling, and hardware depreciation, self-hosted archive storage typically costs 70-85% less than cloud equivalents at scale.

Data sovereignty requirements make self-hosted cold storage mandatory for many organizations. Healthcare data, financial records, and government archives often cannot leave your physical infrastructure regardless of cost. Building your own archive tier gives you complete control over data residency, encryption keys, and access policies.

For related reading on self-hosted storage, see our S3 object storage comparison and distributed filesystems guide. If you need backup verification strategies, our backup testing guide covers integrity checking for archived data.

FAQ

What is the difference between cold storage and backup storage?

Cold storage is the primary location for infrequently accessed data, kept online and accessible via standard APIs (S3, NFS, etc.). Backup storage creates point-in-time copies for disaster recovery purposes. Cold storage data is live and mutable; backup data is immutable and restorable. You need both: backups protect against accidental deletion, while cold storage reduces costs for aging data.

How does MinIO ILM transition data to cold storage?

MinIO ILM uses “remote tiers” — separate S3-compatible endpoints that serve as cold storage targets. When a lifecycle rule triggers (based on object age), MinIO copies the object to the remote tier and updates its metadata to reflect the new location. The original object on the hot tier can then be deleted. Accessing the object automatically retrieves it from the cold tier transparently.

Can Ceph store cold data on slower drives within the same cluster?

Yes. Ceph’s CRUSH algorithm allows you to create different storage pools backed by different drive types. You can create a “cold” pool using HDD OSDs and a “hot” pool using SSDs. Lifecycle policies transition objects between these pools, and CRUSH rules ensure data lands on the appropriate drive type. This is more efficient than MinIO’s approach since it does not require a separate cluster.

Does SeaweedFS support automatic data lifecycle management?

SeaweedFS handles lifecycle through its filer component. You can configure TTL (time-to-live) rules that automatically expire or move files after a specified duration. The filer can replicate data to remote S3-compatible storage for cold tiering. Volume servers also support replication factors that can be adjusted per volume to place data on nodes with appropriate storage characteristics.

What happens if a cold storage node fails?

All three solutions support data redundancy. MinIO uses erasure coding (configurable parity) that can survive multiple disk failures. Ceph’s CRUSH algorithm with erasure coding or replication ensures data availability even with multiple OSD failures. SeaweedFS supports volume replication across nodes. The key is ensuring your cold tier has the same redundancy guarantees as your hot tier — data that is only accessible during recovery scenarios still needs protection.

How do I monitor cold storage health and capacity?

MinIO provides Prometheus metrics and a web console with storage utilization dashboards. Ceph has built-in health monitoring via ceph status, the Ceph Dashboard, and Prometheus exporters. SeaweedFS exposes metrics on its master and volume server HTTP endpoints. For all three, integrate with Grafana for unified monitoring across hot and cold tiers.

Understanding Cold Storage Architecture

MinIO ILM (Information Lifecycle Management)

Key Features

Docker Compose Deployment

Ceph Object Gateway (RGW) Lifecycle Management

Key Features

Ceph Lifecycle Configuration

Ceph Storage Class Configuration

SeaweedFS Tiered Storage

Key Features

SeaweedFS Docker Compose

Comparison: Cold Storage Feature Matrix

Choosing the Right Cold Storage Solution

Why Self-Host Your Cold Storage?

FAQ

What is the difference between cold storage and backup storage?

How does MinIO ILM transition data to cold storage?

Can Ceph store cold data on slower drives within the same cluster?

Does SeaweedFS support automatic data lifecycle management?

What happens if a cold storage node fails?

How do I monitor cold storage health and capacity?

Related Posts

Kubernetes CSI Drivers: Ceph CSI vs Longhorn vs Kadalu (2026)

Self-Hosted S3-Compatible Object Storage: MinIO vs SeaweedFS vs Garage 2026

Self-Hosted NBD Server: Network Block Device Storage Solutions (2026)