MongoDB stores critical application data — user profiles, transaction records, content catalogs, and operational logs. Without a reliable backup strategy, hardware failures, operator errors, or software bugs can result in irreversible data loss. This guide compares three self-hosted MongoDB backup approaches: the built-in mongodump utility, Percona Backup for MongoDB (enterprise-grade consistency), and Mongosh-based scripting (flexible custom backups).
We will cover installation, Docker Compose deployments, backup consistency guarantees, recovery procedures, and the trade-offs between each approach.
mongodump: The Built-In Backup Utility
mongodump is MongoDB’s official logical backup tool, shipped with every MongoDB installation. It connects to a running MongoDB instance, reads all data, and writes BSON files to disk.
How It Works
mongodump iterates through every collection in every database, issuing queries to retrieve all documents. The output is a directory structure of BSON and JSON metadata files that can be restored using mongorestore. For replica sets, mongodump can connect to any member (primary or secondary) and will read from the connected node.
Docker Compose Deployment
| |
Initialization script (init-replica.js) to set up a single-node replica set (required for oplog-based backups):
| |
Running Backups
| |
Consistency and Limitations
mongodump provides eventually consistent snapshots by default. Because it reads collections sequentially, writes that occur during the backup may be partially captured. For point-in-time consistency, you must use it against a secondary with --oplog, which captures the oplog during the dump and allows mongorestore to replay changes up to a consistent point.
Key limitations:
- No built-in scheduling or retention management
- No incremental backups (each run is a full dump)
- Consistency requires oplog access (replica set or sharded cluster)
- Large databases can take hours to dump, during which the database is not locked
Percona Backup for MongoDB: Enterprise-Grade Consistency
Percona Backup for MongoDB (PBM) is an open-source tool designed for consistent, point-in-time backups of MongoDB replica sets and sharded clusters. It coordinates backups across all nodes, captures oplog data, and supports incremental backups with configurable retention policies.
How It Works
PBM installs an agent on each MongoDB node. A central PBM controller coordinates backup operations across agents, ensuring all nodes start and stop at consistent points. It stores backups in S3-compatible object storage or local filesystem, and supports full, incremental, and point-in-time recovery.
Docker Compose Deployment
| |
PBM Configuration
After starting the services, configure PBM to use your S3-compatible storage:
| |
PBM configuration file (pbm-config.yaml):
| |
Running Backups with PBM
| |
Consistency Guarantees
PBM provides strictly consistent point-in-time snapshots. It coordinates across all replica set members to ensure a consistent view of data at the backup start time. The oplog is captured continuously, enabling recovery to any second within the oplog window. Incremental backups reduce storage usage by only capturing changes since the last full or incremental backup.
Mongosh-Based Custom Backup Scripting
Mongosh (the MongoDB Shell) can be used to create custom backup scripts that leverage JavaScript/JavaScript-compatible syntax for flexible, application-aware backups. This approach is ideal when you need selective backups, data transformations, or integration with external systems.
How It Works
Instead of dumping all data blindly, mongosh scripts can filter collections, aggregate data before export, apply transformations, and integrate with backup destinations (S3, GCS, local storage). The script runs as a scheduled job (cron, systemd timer) and can be tailored to specific business requirements.
Docker Compose Deployment
| |
Advanced Backup Script Example
Create a file backup-script.js:
| |
Run with:
| |
When to Use Custom Scripts
Mongosh-based backups are ideal when:
- You need application-level data transformations before export
- You want to backup only specific collections or filtered subsets
- You need to integrate with custom retention or notification systems
- You want to generate backup reports (row counts, size estimates) before running full dumps
Comparison: mongodump vs Percona Backup vs Mongosh
| Feature | mongodump | Percona Backup (PBM) | Mongosh Scripts |
|---|---|---|---|
| Consistency | Eventually consistent (or oplog-based) | Strictly consistent point-in-time | Depends on script |
| Incremental backups | No | Yes | Custom implementation |
| Point-in-time recovery | No (without oplog replay) | Yes | No |
| Scheduling | Manual or external cron | Built-in scheduler | External cron/systemd |
| Retention management | Manual | Built-in policies | Custom implementation |
| Storage backend | Local filesystem | S3, Azure Blob, local | Any (script-dependent) |
| Sharded cluster support | Yes (via mongos) | Yes (coordinated across shards) | Custom implementation |
| Compression | gzip, zstandard | gzip, snappy, s2 | Depends on script |
| Parallel execution | No (single thread per collection) | Yes (per-node agents) | Depends on script |
| Docker image | mongo:7 (official) | percona/percona-backup-mongodb | mongo:7 (official) |
| License | SSPL | Apache 2.0 | SSPL |
| GitHub stars | N/A (part of mongodb/mongo) | 550+ | N/A (part of mongodb/mongo) |
| Best for | Simple full backups, ad-hoc dumps | Production-grade HA backups | Custom selective backups |
Why Self-Host Your MongoDB Backup Infrastructure?
Self-hosting backup solutions gives you complete control over backup schedules, retention policies, encryption, and storage locations. Cloud-managed MongoDB backup services are convenient but introduce vendor lock-in, egress costs, and limited customization. With self-hosted backups, you can:
- Store backups on-premises or in any S3-compatible storage (MinIO, Ceph, Backblaze B2)
- Encrypt backups with your own keys before transmission
- Implement custom retention policies aligned with compliance requirements
- Test restore procedures on your own infrastructure without cloud costs
- Avoid per-GB backup storage fees charged by managed database providers
For database sharding strategies that reduce the need for frequent full backups, see our Vitess vs Citus vs ShardingSphere guide. If you manage database high availability at the cluster level, our Patroni vs Galera vs Repmgr guide covers PostgreSQL. For Kubernetes-level backup orchestration that can complement database backups, our Velero vs Stash vs Volsync comparison covers persistent volume protection.
FAQ
Does mongodump lock the database during backup?
No. mongodump reads data without acquiring locks on the database. It uses cursors to iterate through collections, which means writes can continue during the backup. However, this also means the backup is not point-in-time consistent unless you use the --oplog flag on a replica set.
Can Percona Backup for MongoDB work with a single MongoDB node?
PBM requires a replica set (minimum 2 nodes) because it coordinates backups across multiple agents. A single standalone MongoDB instance cannot use PBM. You can convert a standalone instance to a single-node replica set using rs.initiate().
How much storage does an incremental PBM backup consume?
Incremental backups only store the changes (WiredTiger data blocks) since the last full or incremental backup. For workloads with low write rates, incremental backups are typically 1-5% of the full backup size. For high-write workloads, they can be 10-30% of the full backup size.
Can I restore a mongodump to a different MongoDB version?
Generally yes, mongodump creates BSON files that are compatible across MongoDB versions. However, restoring to a newer major version may require additional steps if the new version has incompatible features. Always test restore procedures against your target version.
How do I automate mongodump with retention?
Use a cron job that runs mongodump with timestamped filenames, combined with a cleanup script that removes backups older than a threshold:
| |
Is mongosh backup scripting production-ready?
Mongosh-based backups are production-ready for selective or filtered backups where you need application-level control. However, they do not provide the consistency guarantees of PBM or the simplicity of mongodump. For full-database disaster recovery, prefer PBM or mongodump with oplog. Use mongosh scripts for supplementary backups (specific collections, transformed data, validation reports).