When organizations outgrow their primary storage capacity, the cost of keeping all data on high-performance disks becomes unsustainable. Cold storage — storing infrequently accessed data on cheaper media while maintaining on-demand accessibility — solves this problem. Instead of paying AWS Glacier or Azure Archive Storage for long-term retention, you can build a self-hosted cold storage tier using open-source object stores with built-in lifecycle management.

This guide compares three leading self-hosted solutions: MinIO with its ILM (Information Lifecycle Management) engine, Ceph with tiered storage and lifecycle policies, and SeaweedFS with its hot/cold storage tiering architecture. Each handles data lifecycle differently, supports different storage backends, and targets different scale profiles.

Understanding Cold Storage Architecture

Cold storage refers to storing data that is accessed infrequently on lower-cost media while keeping it retrievable on demand. Unlike backup systems (which create point-in-time copies), cold storage is the primary location for aging data — just on cheaper hardware.

A proper cold storage architecture includes:

  • Lifecycle policies — rules that automatically transition data from hot to cold tiers based on age, access patterns, or metadata
  • Tiered storage — fast NVMe/SSD for recent data, slower HDD for warm data, and high-capacity drives or tape for cold data
  • S3 compatibility — so existing applications can access cold data without modification
  • Data integrity — checksums and erasure coding to protect data over its extended lifetime
  • Cost efficiency — the primary motivation for cold storage is reducing per-TB storage costs

MinIO ILM (Information Lifecycle Management)

MinIO is the most widely deployed self-hosted S3-compatible object store, with over 60,900 GitHub stars. Its ILM feature provides policy-driven data lifecycle management including automatic tier transitions, expiration rules, and non-current version management.

Key Features

  • Transition rules — automatically move objects to a designated cold tier (remote S3 target) after N days
  • Expiration rules — permanently delete objects after a retention period
  • Non-current version management — apply separate lifecycle rules to old versions of versioned objects
  • S3 API native — lifecycle rules use standard S3 PutBucketLifecycleConfiguration API calls
  • Multi-node distributed mode — erasure-coded storage across multiple nodes for durability

MinIO’s ILM works by defining a “remote tier” (another S3-compatible target, which could be a separate MinIO cluster with slower drives) and creating transition rules that move objects between tiers based on age or access patterns.

Docker Compose Deployment

MinIO’s official distributed deployment uses four nodes with erasure coding:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
version: '3.7'

x-minio-common: &minio-common
  image: quay.io/minio/minio:latest
  command: server --console-address ":9001" http://minio{1...4}/data{1...2}
  expose:
    - "9000"
    - "9001"
  healthcheck:
    test: ["CMD", "mc", "ready", "local"]
    interval: 5s
    timeout: 5s
    retries: 5
  environment:
    MINIO_ROOT_USER: minioadmin
    MINIO_ROOT_PASSWORD: minioadmin-secret

services:
  minio1:
    <<: *minio-common
    hostname: minio1
    volumes:
      - data1-1:/data1
      - data1-2:/data2
  minio2:
    <<: *minio-common
    hostname: minio2
    volumes:
      - data2-1:/data1
      - data2-2:/data2
  minio3:
    <<: *minio-common
    hostname: minio3
    volumes:
      - data3-1:/data1
      - data3-2:/data2
  minio4:
    <<: *minio-common
    hostname: minio4
    volumes:
      - data4-1:/data1
      - data4-2:/data2

volumes:
  data1-1: {}
  data1-2: {}
  data2-1: {}
  data2-2: {}
  data3-1: {}
  data3-2: {}
  data4-1: {}
  data4-2: {}

Configure lifecycle rules using the mc CLI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Add a cold storage tier (separate MinIO cluster with HDDs)
mc admin tier add minio/ cold --s3 \
  --endpoint https://cold-storage.internal:9000 \
  --bucket cold-objects \
  --access-key cold-admin \
  --secret-key cold-secret

# Create a transition rule: move to cold after 30 days
mc ilm add minio/mybucket \
  --transition-days 30 \
  --transition-tier cold

# Set expiration: delete objects after 365 days
mc ilm add minio/mybucket \
  --expiry-days 365

Ceph Object Gateway (RGW) Lifecycle Management

Ceph is a distributed storage platform providing object, block, and file storage from a single cluster. With 16,500+ GitHub stars, it’s the most mature open-source distributed storage system. Ceph’s Object Gateway (RGW) supports S3 lifecycle policies natively.

Key Features

  • Unified storage — object, block (RBD), and file (CephFS) from one cluster
  • Bucket lifecycle policies — S3-compatible lifecycle rules for object expiration and transitions
  • Storage classes — native support for different storage classes within the same cluster
  • Erasure coding — built-in data protection without requiring separate replica pools
  • Multi-site replication — active-active replication across geographic locations
  • CRUSH algorithm — intelligent data placement across heterogeneous storage devices

Ceph’s approach to tiering differs from MinIO’s. Instead of transitioning to a separate cluster, Ceph can use storage classes within the same cluster — placing cold data on cheaper drives using CRUSH rules that target specific OSD types.

Ceph Lifecycle Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
<!-- S3-compatible bucket lifecycle XML configuration -->
<LifecycleConfiguration>
  <Rule>
    <ID>transition-to-cold</ID>
    <Filter>
      <Prefix>logs/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>90</Days>
      <StorageClass>COLD</StorageClass>
    </Transition>
    <Expiration>
      <Days>730</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

Apply the policy using aws-cli or s3cmd:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Apply lifecycle policy to a bucket
aws --endpoint-url https://rgw.ceph.internal:443 \
  s3api put-bucket-lifecycle-configuration \
  --bucket my-data \
  --lifecycle-configuration file://lifecycle.json

# Verify the policy
aws --endpoint-url https://rgw.ceph.internal:443 \
  s3api get-bucket-lifecycle-configuration \
  --bucket my-data

Ceph Storage Class Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# In ceph.conf, define storage tiers by drive type
[global]
osd pool default crush rule = 1  # SSD-backed rule

# Create a cold storage pool using HDD OSDs
ceph osd pool create cold-storage 64 64
ceph osd pool set cold-storage crush_rule cold-hdd

# Configure RGW to use storage classes
rgw_swift_versioning_enabled = true
rgw_lc_debug_interval = 30  # Lifecycle check interval (seconds)

SeaweedFS Tiered Storage

SeaweedFS (32,100+ GitHub stars) is a distributed storage system designed for billions of files with O(1) disk seek time. Its unique architecture separates volume servers (data storage) from filer servers (metadata), enabling efficient hot/cold tiering through volume movement.

Key Features

  • O(1) disk access — unique volume-based architecture avoids directory tree traversal
  • Automatic tiering — volume server can replicate data across hot and cold nodes
  • S3 + Filer + Hadoop compatible — multiple access protocols from one cluster
  • Filer-to-S3 bridge — automatically replicate filer data to S3-compatible cold storage
  • Iceberg table support — native integration with Apache Iceberg for data lake use cases
  • Small file optimization — stores many small files efficiently (unlike HDFS)

SeaweedFS achieves tiering through its volume replication and filer remote storage features. Volumes can be replicated across nodes with different storage characteristics, and the filer can transparently offload old data to S3-compatible backends.

SeaweedFS Docker Compose

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
version: '3.8'

services:
  weed-master:
    image: chrislusf/seaweedfs:latest
    command: master -ip=master -ip.bind=0.0.0.0 -defaultReplication=000
    ports:
      - "9333:9333"
      - "19333:19333"
    volumes:
      - master-data:/data

  weed-volume-hot:
    image: chrislusf/seaweedfs:latest
    command: volume -mserver=master:9333 -ip=volume-hot -port=8080 -dir=/data -max=100
    ports:
      - "8080:8080"
      - "18080:18080"
    volumes:
      - hot-data:/data  # SSD-backed
    deploy:
      resources:
        limits:
          memory: 4G

  weed-volume-cold:
    image: chrislusf/seaweedfs:latest
    command: volume -mserver=master:9333 -ip=volume-cold -port=8081 -dir=/data -max=200
    ports:
      - "8081:8080"
      - "18081:18080"
    volumes:
      - cold-data:/data  # HDD-backed
    deploy:
      resources:
        limits:
          memory: 2G

  weed-filer:
    image: chrislusf/seaweedfs:latest
    command: filer -master=master:9333
    ports:
      - "8888:8888"
      - "8333:8333"
    volumes:
      - filer-data:/data
    depends_on:
      - weed-master

volumes:
  master-data: {}
  hot-data: {}
  cold-data: {}
  filer-data: {}

Configure cold storage tiering via the filer configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# filer.toml - configure remote S3 storage for cold data offload
[storage]
  [storage.s3]
    enabled = true
    endpoint = "http://cold-storage:9000"
    region = "us-east-1"
    bucket = "seaweedfs-cold"
    access_key = "admin"
    secret_key = "password"

# Set automatic offload rules
[leveldb2]
  dir = "/data"
  ttl_seconds = 0  # no expiration by default

# Volume tiering: replicate cold volumes to HDD nodes
[volume]
  data_center = "dc1"
  replication = "001"  # 1 copy on a different rack

Comparison: Cold Storage Feature Matrix

FeatureMinIO ILMCeph RGW LifecycleSeaweedFS Tiering
S3 API CompatibilityFull (native)Full (via RGW)Full (via S3 gateway)
Lifecycle TransitionsTo remote S3 tierStorage class within clusterVolume movement + remote
Erasure CodingYes (built-in)Yes (CRUSH-based)Yes (EC volume mode)
Multi-Site ReplicationYes (site replication)Yes (active-active)Via volume replication
Small File HandlingGoodModerateExcellent (O(1) seek)
Deployment ComplexityLowHighModerate
Storage ClassesVia remote tiersNative (via CRUSH)Via volume placement
Object ExpirationYes (ILM rules)Yes (lifecycle XML)Via filer TTL
Non-current Version MgmtYesPartialVia filer versioning
GitHub Stars60,900+16,500+32,100+
LicenseAGPLv3LGPL / GPLApache 2.0
Best ForS3-native workloadsUnified storage needsBillions of small files

Choosing the Right Cold Storage Solution

Choose MinIO ILM when:

  • You need full S3 API compatibility with standard lifecycle rules
  • Your team already uses mc CLI and S3 SDKs
  • You want simple deployment with clear hot/cold cluster separation
  • You’re transitioning data to an external cold storage target (another MinIO cluster, Wasabi, Backblaze B2)

Choose Ceph when:

  • You need unified object, block, and file storage from one platform
  • You already run Ceph for other storage workloads (RBD, CephFS)
  • You need multi-site active-active replication with conflict resolution
  • You want storage-class-based tiering within a single cluster using CRUSH rules

Choose SeaweedFS when:

  • You store billions of small files (images, documents, logs)
  • You need O(1) disk access time for large directory structures
  • You want Apache Iceberg table integration for analytics workloads
  • You need lightweight deployment with minimal operational overhead

Why Self-Host Your Cold Storage?

Cloud archive storage pricing appears attractive at first glance, but costs accumulate quickly as data volumes grow. AWS Glacier charges $0.004/GB/month for storage, but retrieval costs range from $0.01 to $0.06 per GB depending on speed, and API requests add $0.05 per 1,000 operations. At 100 TB of archive data, you are looking at $400/month in storage alone, plus retrieval costs that can easily exceed $1,000 per restore operation.

Self-hosted cold storage changes this economics entirely. With commodity 18 TB hard drives at approximately $300 each, your cost per TB drops to roughly $17 for raw storage. Even accounting for power, cooling, and hardware depreciation, self-hosted archive storage typically costs 70-85% less than cloud equivalents at scale.

Data sovereignty requirements make self-hosted cold storage mandatory for many organizations. Healthcare data, financial records, and government archives often cannot leave your physical infrastructure regardless of cost. Building your own archive tier gives you complete control over data residency, encryption keys, and access policies.

For related reading on self-hosted storage, see our S3 object storage comparison and distributed filesystems guide. If you need backup verification strategies, our backup testing guide covers integrity checking for archived data.

FAQ

What is the difference between cold storage and backup storage?

Cold storage is the primary location for infrequently accessed data, kept online and accessible via standard APIs (S3, NFS, etc.). Backup storage creates point-in-time copies for disaster recovery purposes. Cold storage data is live and mutable; backup data is immutable and restorable. You need both: backups protect against accidental deletion, while cold storage reduces costs for aging data.

How does MinIO ILM transition data to cold storage?

MinIO ILM uses “remote tiers” — separate S3-compatible endpoints that serve as cold storage targets. When a lifecycle rule triggers (based on object age), MinIO copies the object to the remote tier and updates its metadata to reflect the new location. The original object on the hot tier can then be deleted. Accessing the object automatically retrieves it from the cold tier transparently.

Can Ceph store cold data on slower drives within the same cluster?

Yes. Ceph’s CRUSH algorithm allows you to create different storage pools backed by different drive types. You can create a “cold” pool using HDD OSDs and a “hot” pool using SSDs. Lifecycle policies transition objects between these pools, and CRUSH rules ensure data lands on the appropriate drive type. This is more efficient than MinIO’s approach since it does not require a separate cluster.

Does SeaweedFS support automatic data lifecycle management?

SeaweedFS handles lifecycle through its filer component. You can configure TTL (time-to-live) rules that automatically expire or move files after a specified duration. The filer can replicate data to remote S3-compatible storage for cold tiering. Volume servers also support replication factors that can be adjusted per volume to place data on nodes with appropriate storage characteristics.

What happens if a cold storage node fails?

All three solutions support data redundancy. MinIO uses erasure coding (configurable parity) that can survive multiple disk failures. Ceph’s CRUSH algorithm with erasure coding or replication ensures data availability even with multiple OSD failures. SeaweedFS supports volume replication across nodes. The key is ensuring your cold tier has the same redundancy guarantees as your hot tier — data that is only accessible during recovery scenarios still needs protection.

How do I monitor cold storage health and capacity?

MinIO provides Prometheus metrics and a web console with storage utilization dashboards. Ceph has built-in health monitoring via ceph status, the Ceph Dashboard, and Prometheus exporters. SeaweedFS exposes metrics on its master and volume server HTTP endpoints. For all three, integrate with Grafana for unified monitoring across hot and cold tiers.