Container checkpoint and restore is the ability to save the complete runtime state of a running container — memory, file descriptors, network connections, process state — to disk, then restore it later on the same or a different host. This capability enables live migration, zero-downtime updates, and stateful container recovery that traditional container restarts cannot achieve.

The technology behind this is CRIU (Checkpoint/Restore in Userspace), a Linux kernel feature that has matured significantly and now supports most major container runtimes.

What Is Container Checkpoint & Restore?

When you checkpoint a container, the system captures:

  • Process memory — all RAM pages for running processes
  • Open file descriptors — sockets, pipes, file handles
  • Network state — TCP connections, listening ports
  • Process tree — parent-child relationships, PID mapping
  • Mount namespaces — filesystem mount points and state

When restored, the container resumes exactly where it left off — including established TCP connections, in-memory caches, and running computations. This is fundamentally different from docker stop && docker start, which terminates the process and loses all runtime state.

CRIU — The Foundation

CRIU is the core Linux technology that enables checkpoint/restore in userspace. With 3,800+ GitHub stars, it is the foundation for all container-level checkpoint/restore implementations.

Key Features:

  • Checkpoint and restore of entire process trees
  • Network connection migration (TCP state preservation)
  • File descriptor migration (open files, pipes, sockets)
  • Namespace support (PID, mount, network, IPC)
  • Lazy pages support (page in memory on demand during restore)
  • User namespace support for rootless containers

Installation:

1
2
3
4
5
6
7
8
# Ubuntu/Debian
sudo apt install criu

# RHEL/CentOS/Fedora
sudo dnf install criu

# Alpine
sudo apk add criu

Basic Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Checkpoint a process (by PID)
sudo criu dump -t <PID> -D /tmp/checkpoint --tcp-established --shell-job

# Restore the process
sudo criu restore -D /tmp/checkpoint --tcp-established --shell-job

# Live migration to another host
sudo criu dump -t <PID> -D /tmp/checkpoint --tcp-established
scp -r /tmp/checkpoint remote:/tmp/
ssh remote "criu restore -D /tmp/checkpoint --tcp-established"

Docker Compose Setup for CRIU:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
version: "3.8"
services:
  checkpoint-manager:
    image: ubuntu:24.04
    privileged: true
    volumes:
      - /tmp/checkpoints:/checkpoints
      - /proc:/proc
    cap_add:
      - SYS_ADMIN
      - SYS_PTRACE
    command: >
      bash -c "apt update && apt install -y criu &&
               tail -f /dev/null"
    networks:
      - migration-network

networks:
  migration-network:
    driver: bridge

Podman Checkpoint & Restore

Podman has built-in support for CRIU-based checkpoint and restore, making it the easiest container runtime to use for this workflow. Since Podman runs rootless by default, it also supports rootless checkpoint/restore.

Installation:

1
2
3
4
5
# Ubuntu/Debian
sudo apt install podman criu

# RHEL/Fedora
sudo dnf install podman criu

Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Start a container
podman run -d --name myapp nginx:latest

# Checkpoint the running container
podman container checkpoint myapp -e /tmp/myapp-checkpoint.tar.gz

# The container is now stopped, state saved to tarball

# Restore on the same host
podman container restore myapp -i /tmp/myapp-checkpoint.tar.gz

# Or restore on a different host
scp /tmp/myapp-checkpoint.tar.gz remote:/tmp/
ssh remote "podman container restore myapp -i /tmp/myapp-checkpoint.tar.gz"

Docker Compose Alternative (Podman Compose):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
version: "3.8"
services:
  webapp:
    image: nginx:latest
    ports:
      - "8080:80"
    volumes:
      - ./html:/usr/share/nginx/html
  backup-service:
    image: ubuntu:24.04
    privileged: true
    volumes:
      - /var/lib/containers/storage:/var/lib/containers/storage
      - ./checkpoints:/checkpoints
    command: >
      bash -c "apt update && apt install -y criu &&
               podman container checkpoint webapp -e /checkpoints/webapp.tar.gz"

Kubernetes Container Migration

While Kubernetes does not natively support container checkpoint/restore, several projects enable live migration of pods between nodes:

Kube-CRIU Integration

The cri (Container Runtime Interface) supports checkpoint/restore operations through the CRI API. Some runtime implementations provide this functionality:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Pod manifest with checkpoint annotation
apiVersion: v1
kind: Pod
metadata:
  name: migratable-pod
  annotations:
    checkpoint.kubernetes.io/enabled: "true"
spec:
  containers:
    - name: app
      image: myapp:latest
      resources:
        requests:
          memory: "256Mi"
          cpu: "250m"

Using crictl for Checkpoint:

1
2
3
4
5
6
7
8
# Get the container ID
crictl ps --name myapp

# Checkpoint the container
crictl checkpoint <container-id> /tmp/checkpoint.tar

# On target node, restore
crictl restore <new-container-id> /tmp/checkpoint.tar

KubeVirt Live Migration

For VM-based workloads, KubeVirt provides native live migration between Kubernetes nodes, which is often a more practical approach for production stateful workloads:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: migratable-vm
spec:
  running: true
  template:
    spec:
      domain:
        resources:
          requests:
            memory: 2Gi
        devices:
          disks:
            - name: rootfs
              disk:
                bus: virtio
      volumes:
        - name: rootfs
          persistentVolumeClaim:
            claimName: vm-rootfs

Trigger Migration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Create a VirtualMachineInstanceMigration resource
kubectl apply -f - <<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  name: vm-migration
spec:
  vmiName: migratable-vm
EOF

# Monitor migration status
kubectl get vmi_migration vm-migration

Comparison Table

FeatureCRIU (Direct)Podman CheckpointKubeVirt Migration
Checkpoint TargetProcessesContainersVirtual Machines
Restore LocationSame or different hostSame or different hostDifferent K8s node
TCP State Preservation✅ Yes✅ Yes✅ Yes
File Descriptor Migration✅ Yes✅ Yes✅ Yes
Rootless Support✅ (with user ns)✅ YesN/A (VM-level)
Orchestration IntegrationManualPodman ComposeKubernetes native
Live MigrationManual (dump + copy + restore)Manual (tar + copy + restore)✅ Zero-downtime
Setup ComplexityHighLowMedium
Production Ready✅ Yes✅ Yes✅ Yes
GitHub Stars3,800+31,500+ (Podman)17,000+ (KubeVirt)
Last UpdatedApril 2026May 2026May 2026

Use Cases

Zero-Downtime Container Updates

Instead of stopping a container and starting a new version (losing in-memory state), checkpoint the running container, start the new version, and restore the old container’s state into a sidecar for debugging:

1
2
3
4
5
6
7
8
# Checkpoint current version
podman container checkpoint myapp -e /tmp/myapp-v1.tar.gz

# Start new version
podman run -d --name myapp-v2 myapp:v2

# Restore v1 state for comparison/debugging
podman run -d --name myapp-v1-debug --restore /tmp/myapp-v1.tar.gz myapp:v1

Disaster Recovery

Save container state regularly as a backup strategy. If a node fails, restore the checkpoint on a healthy node:

1
2
# Cron job for regular checkpoints
0 */6 * * * podman container checkpoint myapp -e /backup/myapp-$(date +\%F-\%H).tar.gz

Cross-Datacenter Migration

Migrate workloads between datacenters by checkpointing on the source, transferring the checkpoint archive, and restoring on the destination — preserving all active connections and in-memory state.

Why Container Checkpoint Matters

Traditional container orchestration treats containers as ephemeral and stateless. When you need to move a container, you stop it and start a fresh instance — losing all runtime state. Checkpoint and restore changes this paradigm:

  • Preserves TCP connections — clients don’t see connection drops during migration
  • Maintains in-memory caches — no cold-start performance penalty after restore
  • Enables live migration — move workloads between hosts without downtime
  • Simplifies debugging — save a failing container’s exact state for offline analysis

For container runtime comparisons, see our containerd vs CRI-O vs Podman guide and Incus vs LXD vs Podman comparison. For container sandboxing approaches that complement checkpoint/restore security, check our gVisor vs Kata Containers guide.

FAQ

What is the difference between checkpoint and snapshot?

A snapshot captures the filesystem state of a container at a point in time (like docker commit). A checkpoint captures the complete runtime state — memory, network connections, file descriptors, and process state. Snapshots let you restart from a filesystem baseline; checkpoints let you resume exactly where the container left off, including active network connections.

Does Docker support checkpoint and restore?

Docker added experimental checkpoint/restore support using CRIU in version 1.13, but it was never promoted to stable and was deprecated in later versions. Podman has better native support. For Docker-based workflows, you need to use CRIU directly or switch to Podman.

What are the limitations of container checkpoint/restore?

Not all applications can be checkpointed. Limitations include: applications with open GPU contexts, some database engines with memory-mapped files, containers with mounted FUSE filesystems, and applications using certain Linux kernel features (userfaultfd, inotify). CRIU provides a pre-check (criu check) to verify compatibility before attempting checkpoint.

Can I checkpoint a running database?

Technically yes, but it is risky. Databases like PostgreSQL and MySQL use memory-mapped files and transaction logs. Checkpointing a running database may result in inconsistent state on restore. It is safer to use the database’s native backup tools (pg_dump, mysqldump) combined with WAL/archive logs for consistent backups.

How large is a checkpoint file?

Checkpoint size depends on the container’s memory usage. A container using 512MB of RAM will produce a checkpoint of roughly 200-500MB (compressed). With lazy pages support, you can restore without immediately loading all memory pages, reducing restore time significantly.

Is container checkpoint/restore production-ready?

CRIU and Podman checkpoint are production-ready for supported workloads. Kubernetes-native migration is still emerging — KubeVirt provides the most mature solution for VM workloads, while container-level migration in Kubernetes requires manual orchestration or custom operators. Always test checkpoint/restore with your specific application before relying on it in production.