Introduction
PostgreSQL generates a wealth of diagnostic data through its logging system — query execution times, error messages, connection events, checkpoint statistics, and autovacuum activity. However, raw PostgreSQL logs are verbose plaintext files that can reach gigabytes per day on busy systems. Making sense of this data requires specialized analysis tools.
In this guide, we compare three approaches to PostgreSQL log analysis: pgBadger, the industry-standard log analyzer that produces rich HTML reports; pg_stat_statements, PostgreSQL’s built-in query performance aggregator; and direct CSV log analysis using shell scripts and SQL for custom diagnostics. Each approach serves different use cases — from executive-friendly dashboards to deep forensic investigations.
Comparison Table
| Feature | pgBadger | pg_stat_statements | CSV Log Analysis |
|---|---|---|---|
| Purpose | HTML log reports & dashboards | Query performance aggregation | Custom log investigation |
| Type | Standalone Perl tool | PostgreSQL extension | Shell + SQL scripts |
| Stars | 4,023+ | Built-in (core PG) | N/A (custom) |
| Last Updated | June 2026 | Active (PG 17) | N/A |
| Output Format | HTML, JSON, TSV | SQL queries | Custom (CSV, JSON) |
| Disk Overhead | None (processes existing logs) | ~1-5% shared_buffers | Minimal |
| Setup Complexity | Low (single binary) | Very low (CREATE EXTENSION) | Medium (writing scripts) |
| Granularity | Per-query, per-session, per-database | Aggregated by query fingerprint | Fully customizable |
| Best For | Daily/weekly reports, trend analysis | Real-time query monitoring | Forensic debugging, custom metrics |
pgBadger: Rich Log Analysis Reports
pgBadger is the gold standard for PostgreSQL log analysis. It parses PostgreSQL log files and generates detailed HTML reports covering query performance, connection statistics, error rates, checkpoint activity, vacuum operations, and temporary file usage. Its single-binary design (Perl) makes it trivial to deploy.
Installation
| |
PostgreSQL Logging Configuration
pgBadger requires PostgreSQL to log in a compatible format. Add these settings to postgresql.conf:
| |
Docker Compose Setup
| |
Generating Reports
| |
pgBadger’s HTML reports include: hourly query volume charts, slowest queries ranked by duration, most frequent queries, connection spikes, checkpoint timing, autovacuum activity, error distribution, and temporary file usage. This makes it ideal for weekly performance reviews and identifying trends over time.
pg_stat_statements: Real-Time Query Monitoring
pg_stat_statements is PostgreSQL’s built-in query performance extension. Unlike pgBadger (which analyzes log files after the fact), pg_stat_statements aggregates query statistics in real-time within the database itself. It normalizes queries (replacing literals with $1, $2, etc.), groups identical query patterns, and tracks execution counts, total time, rows returned, shared block hits/reads, and more.
Enabling pg_stat_statements
| |
Key Queries
| |
pg_stat_statements is perfect for real-time dashboards: you can poll these queries every 30 seconds and feed them into Grafana, Prometheus, or a custom monitoring stack. For comprehensive PostgreSQL monitoring, see our PostgreSQL monitoring comparison.
CSV Log Analysis: Custom Forensic Investigation
For ad-hoc debugging or when you need metrics that neither pgBadger nor pg_stat_statements provides, direct CSV log analysis gives you complete flexibility. PostgreSQL can write logs in CSV format, which is easily importable into PostgreSQL itself for SQL-based analysis.
Enabling CSV Logging
| |
Importing CSV Logs into PostgreSQL
| |
This approach shines when you’re investigating a specific incident — for example, tracking down which user ran a destructive query at 3 AM, or identifying a connection flood pattern that doesn’t show up in aggregated statistics.
Why Self-Host Your Log Analysis
Running pgBadger and pg_stat_statements on your own infrastructure means you own your query performance data. Unlike managed PostgreSQL services that may meter or restrict access to log files, self-hosting gives you unlimited historical retention at the cost of your own storage. A busy PostgreSQL instance can generate 50-100 GB of logs per month — on a cloud service, that might mean choosing between high storage costs or losing diagnostic data.
For businesses handling sensitive data, log files contain query text that may include PII or business logic. Keeping logs on-premises eliminates the risk of exposing this data to a third-party analytics service. With pgBadger, you can even configure it to hash or anonymize query literals before generating reports, giving you performance insights without data leakage.
The open-source ecosystem also means flexibility. If pgBadger’s default reports don’t cover your specific needs, you can use its JSON output mode and pipe the data into your own visualization stack. For a complete PostgreSQL observability setup, pair pgBadger with our PostgreSQL admin tools guide. If you’re optimizing database performance, our database tuning guide covers the complementary task of configuration optimization.
Integration: A Complete Log Analysis Pipeline
For a comprehensive setup, use all three tools together:
- Real-time monitoring — pg_stat_statements feeding a Grafana dashboard with 30-second refresh for immediate query performance alerts.
- Daily reports — pgBadger cron job processing yesterday’s logs and emailing the HTML report to the team.
- Forensic investigation — CSV log import for ad-hoc analysis when debugging specific incidents.
| |
FAQ
How much overhead does pg_stat_statements add?
pg_stat_statements adds minimal overhead — typically 1-5% of shared_buffers for storing query texts and statistics. The CPU cost is negligible because it only hashes and normalizes each query once. On modern hardware handling 10,000+ queries per second, the overhead is typically less than 2%. The extension has been battle-tested in production at companies like Instagram, Heroku, and GitLab.
Can pgBadger handle logs from multiple PostgreSQL instances?
Yes. pgBadger can process logs from multiple servers simultaneously if you prefix log lines with the server name. Use the %a format specifier in log_line_prefix to include an application name, or set syslog_ident to distinguish instances. Then feed all log files to pgBadger at once — it will group statistics by server in the report.
How long should I retain PostgreSQL logs?
This depends on your compliance requirements and available storage. For performance analysis, 30-90 days is typical — enough to identify weekly/monthly patterns. For security auditing, you may need 1-7 years depending on regulations. pgBadger reports compress well (a month of logs becomes a ~5-20 MB HTML report), so retaining historical reports is cheap even if you rotate the raw logs.
What if my queries contain sensitive data in the log?
Use PostgreSQL 13+ log_parameter_max_length_on_error to control how much parameter data appears in error logs. For pg_stat_statements, the extension normalizes literals automatically (replaces 'john@example.com' with $1). pgBadger can hash query parameters via --anonymize. For CSV logs, filter sensitive columns during import: COPY (SELECT log_time, error_severity, message FROM postgres_log) TO ....
Is pg_stat_statements enough, or do I need pgBadger too?
pg_stat_statements gives you real-time query-level aggregation but doesn’t capture session-level metrics (connection counts, disconnection times), checkpoint timing, autovacuum details, or error distributions. pgBadger excels at these broader operational metrics and produces shareable reports. For a complete picture, use both: pg_stat_statements for immediate query tuning, pgBadger for daily/weekly operational reviews.
💡 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到 AI 监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测 AI 相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com