PostgreSQL’s MVCC (Multi-Version Concurrency Control) architecture means that every UPDATE and DELETE leaves behind dead tuples. Over time, these dead tuples accumulate as table bloat, degrading query performance and wasting disk space. While PostgreSQL’s autovacuum daemon handles basic cleanup, it cannot rebuild tables or indexes without acquiring exclusive locks that block all reads and writes.
For production databases that cannot afford downtime, three specialized tools provide online table reorganization: pg_repack, pg_squeeze, and pgcompacttable. Each uses a different approach to reclaim bloat and rebuild indexes without blocking concurrent queries.
This guide compares these tools for self-hosted PostgreSQL deployments, covering their architecture, performance characteristics, and operational requirements.
Understanding PostgreSQL Table Bloat
Before diving into the tools, it helps to understand why bloat occurs:
- MVCC dead tuples: UPDATE and DELETE operations mark old row versions as dead rather than removing them immediately
- Autovacuum limitations: Autovacuum marks dead space as reusable but does not physically shrink tables or rebuild indexes
- VACUUM FULL: Rebuilds the table from scratch but takes an ACCESS EXCLUSIVE lock, blocking all access
- Index bloat: B-tree indexes grow with every INSERT and are only partially cleaned by autovacuum
For databases with heavy write loads, bloat can consume 30-70% of table size. A 100 GB table with 50% bloat means you’re storing 50 GB of dead data, and queries must scan through it unnecessarily.
pg_repack: The Battle-Tested Solution
GitHub: reorg/pg_repack · 2,242 stars · Active development
pg_repack is the most widely deployed online table reorganization tool for PostgreSQL. Originally developed by the PostgreSQL Global Development Group, it uses a trigger-based approach to rebuild tables and indexes concurrently.
How pg_repack Works
- Creates a shadow copy of the target table with the same structure
- Attaches triggers to the original table to capture all INSERT/UPDATE/DELETE operations
- Copies existing rows to the shadow table
- Replays captured changes from the trigger log
- Swaps the shadow table with the original using a brief exclusive lock (typically < 1 second)
- Drops the old table
Installing pg_repack
| |
Docker Compose with pg_repack
| |
Running pg_repack
| |
pg_repack Limitations
| Constraint | Detail |
|---|---|
| Primary key required | Tables without PK or UNIQUE NOT NULL columns cannot be repacked |
| Superuser access | Requires superuser or pg_repack role membership |
| Disk space | Needs approximately 2x table size for the shadow copy |
| No DDL during repack | ALTER TABLE on the target table during repack will fail |
| TOAST tables | Supported since pg_repack 1.4.4 |
pg_squeeze: The Modern Alternative
GitHub: cybertec-postgresql/pg_squeeze · 666 stars
pg_squeeze, developed by Cybertec, takes a fundamentally different approach from pg_repack. Instead of using triggers, it leverages PostgreSQL’s logical decoding to capture changes during the copy process.
How pg_squeeze Works
- Creates a shadow table copy
- Uses logical decoding (via
test_decodingoutput plugin) to capture changes during the copy - Replays captured changes into the shadow table
- Acquires a brief lock to swap tables
- Optionally removes unnecessary free space (the “squeeze” aspect)
Installing pg_squeeze
| |
PostgreSQL Configuration for pg_squeeze
| |
Docker Compose with pg_squeeze
| |
Running pg_squeeze
| |
pg_squeeze Advantages Over pg_repack
- No triggers: Logical decoding has lower overhead than row-level triggers on heavily written tables
- Configurable thresholds: Only rebuilds tables exceeding a specified bloat percentage
- Automatic scheduling: Built-in scheduler runs maintenance on a configurable interval
- Better free space management: Actively squeezes free space, not just reorganizes
pgcompacttable: The Lightweight Script
GitHub: dataegret/pgcompacttable · 356 stars
pgcompacttable is a Python script developed by Data Egret that takes a different approach entirely. Rather than creating shadow tables, it incrementally reduces bloat by running targeted VACUUM operations on individual pages.
How pgcompacttable Works
- Analyzes table bloat using pg_stat_user_tables
- Runs VACUUM on tables exceeding the bloat threshold
- Uses small, incremental operations to avoid long locks
- Monitors lock wait times and backs off if contention is detected
Installing pgcompacttable
| |
Docker Compose with pgcompacttable
| |
pgcompacttable Characteristics
| Feature | Detail |
|---|---|
| Lock impact | Minimal — uses standard VACUUM with lock timeout awareness |
| Effectiveness | Moderate — reduces bloat incrementally, doesn’t fully rebuild |
| Setup complexity | Low — no extensions or preload libraries required |
| Primary key needed | No |
| Disk overhead | None — works in-place |
Comprehensive Comparison
| Feature | pg_repack | pg_squeeze | pgcompacttable |
|---|---|---|---|
| Approach | Trigger-based shadow copy | Logical decoding shadow copy | Incremental VACUUM |
| Table rebuild | Full rebuild | Full rebuild | Partial reduction |
| Bloat elimination | Complete | Complete | Incremental |
| Index rebuild | Yes | Yes | No (relies on VACUUM) |
| Required extension | Yes (pg_repack) | Yes (squeeze) | No |
| PostgreSQL config | shared_preload_libraries | shared_preload_libraries + wal_level=logical | None |
| Primary key required | Yes | No | No |
| Lock duration | ~1 second swap | ~1 second swap | None (uses VACUUM) |
| Disk overhead | 2x table size | 2x table size | None |
| Maturity | Very mature (since 2012) | Modern (active since 2019) | Moderate (2017, slower updates) |
Choosing the Right Maintenance Tool
Use pg_repack when:
- You need complete table and index rebuilds
- Your tables have primary keys or unique NOT NULL constraints
- You want the most battle-tested solution
- You have sufficient disk space for shadow copies
Use pg_squeeze when:
- You want automated, threshold-based maintenance
- Trigger overhead on busy tables is a concern
- You need maintenance without primary key constraints
- You’re running PostgreSQL 12+ with logical decoding support
Use pgcompacttable when:
- You cannot install extensions (shared hosting, managed services)
- You need a zero-overhead solution that works in-place
- Incremental bloat reduction is acceptable
- You want a simple Python script with minimal dependencies
Why Self-Host PostgreSQL Maintenance Tools?
Running PostgreSQL maintenance tools on your own infrastructure ensures that database optimization happens on your schedule, not your cloud provider’s. Self-hosted PostgreSQL gives you direct access to the database server for running pg_repack, pg_squeeze, or pgcompacttable without API limitations or maintenance windows imposed by managed service providers.
For organizations with large PostgreSQL deployments, self-hosting maintenance tools provides predictable performance characteristics, full control over scheduling, and the ability to tune maintenance operations based on your specific workload patterns. You can integrate these tools with your existing monitoring stack to trigger maintenance automatically when bloat thresholds are exceeded.
For related PostgreSQL management, see our PostgreSQL backup guide for comprehensive backup strategies and PostgreSQL high availability setup for production deployments.
FAQ
Does pg_repack block reads and writes during table rebuild?
No. pg_repack uses triggers to capture changes while building a shadow copy. Reads and writes continue normally during the copy phase. Only the final table swap requires an ACCESS EXCLUSIVE lock, which typically lasts less than one second. This is dramatically faster than VACUUM FULL, which blocks all access for the entire rebuild duration.
Can I run pg_repack on tables without a primary key?
No. pg_repack requires either a primary key or a UNIQUE NOT NULL constraint to identify rows during the trigger-based change capture. If your table lacks both, you’ll need to add one before running pg_repack, or use pgcompacttable which has no such requirement.
How much disk space does pg_repack need?
Approximately twice the table size. pg_repack creates a complete shadow copy of the table, including all indexes. For a 50 GB table, you’ll need about 50 GB of free space for the shadow copy. After the swap, the old table is dropped and space is reclaimed.
Is pg_squeeze production-ready?
Yes. pg_squeeze has been in active development since 2019 and is used in production by Cybertec’s clients. It requires PostgreSQL 12+ with logical decoding enabled (wal_level = logical). The main consideration is the additional WAL generation during the copy phase, which may impact replication lag on busy systems.
Can I automate table maintenance on a schedule?
Yes. pg_squeeze has a built-in scheduler that runs maintenance on configurable intervals. For pg_repack, you can use cron, systemd timers, or Kubernetes CronJobs to schedule periodic runs. pgcompacttable can be wrapped in a cron job with the desired bloat threshold parameters.
What happens if pg_repack is interrupted mid-operation?
If the process is killed during the copy phase, pg_repack cleans up the shadow table and removes its triggers. No data is lost — the original table remains intact. If interrupted during the final swap, the operation can be retried. However, it’s important to monitor disk space during large repack operations to prevent failures.
Should I disable autovacuum when using these tools?
No. Autovacuum and these tools serve different purposes. Autovacuum handles routine dead tuple cleanup and prevents transaction ID wraparound. Tools like pg_repack and pg_squeeze address structural bloat that autovacuum cannot fix. Keep autovacuum enabled and use these tools as supplementary maintenance on a weekly or monthly schedule.