Introduction
When a production service crashes, hangs, or behaves unexpectedly, a debugger is often the only tool that can reveal what’s really happening inside a running process. For Linux server administrators and SRE teams, proficiency with debugging tools separates reactive firefighting from proactive incident resolution.
Modern Linux debuggers have evolved far beyond their 1990s origins. Today’s tools support reverse execution, Python scripting, remote debugging, and seamless integration with core dumps — capabilities that make them indispensable for diagnosing complex server-side issues.
In this guide, we compare three powerful Linux debugging tools: gdb (the GNU Debugger, the industry standard), lldb (LLVM’s modern debugger with superior scripting), and rr (Mozilla’s record-and-replay debugger for deterministic debugging).
Tool Comparison
gdb — The Universal Debugger
gdb is the original and most widely deployed debugger for Linux systems. It supports dozens of target architectures, programming languages (C, C++, Go, Rust, Fortran, and more), and debugging scenarios from live process attachment to post-mortem core dump analysis.
Key Features:
- Live process attach/detach without restart
- Core dump analysis for post-mortem debugging
- Remote debugging via gdbserver (embedded, containers, VMs)
- Python scripting API for custom commands and automation
- Reverse debugging (limited, via process record)
- Thread-aware debugging with scheduler locking
- Expression evaluation in multiple languages
- TUI (Text User Interface) mode for terminal-based visual debugging
Strengths: Universal support — every Linux distribution ships gdb. Extensive scripting capabilities via Python. Mature and battle-tested over 30+ years. The gold standard for core dump analysis.
Limitations: Command-line interface can be intimidating for newcomers. Reverse debugging is slow (single-stepping with record/replay). Limited structured output for automation (though the Machine Interface protocol helps).
lldb — The Modern Contender
lldb is LLVM’s debugger, designed from the ground up with modern architecture and excellent scripting capabilities. It uses a modular, library-based design that makes it ideal for integration into IDEs and custom tooling.
Key Features:
- Full Python scripting with deep debugger state access
- Structured output via SB (Script Bridge) API
- Expression parser based on Clang/LLVM (excellent C++ support)
- Watchpoints with conditional triggers
- Reverse debugging via rr integration
- REPL-style interface with command history and tab completion
- Parallel debugging of multiple targets
- Native macOS support (in addition to Linux)
Strengths: Superior scripting and automation capabilities. First-class C++ debugging with LLVM’s expression parser. Clean, modular architecture makes it extensible. Excellent integration with rr for reverse debugging.
Limitations: Smaller community and ecosystem compared to gdb. Some gdb-specific extensions and scripts require porting. Less legacy architecture support.
rr — Deterministic Replay Debugging
rr (Record and Replay) takes a fundamentally different approach: instead of debugging a live process, rr records an entire process execution, then allows you to replay it deterministically — forward AND backward — as many times as needed. This transforms debugging from a one-shot investigation into a repeatable process.
Key Features:
- Full reverse execution (step backward, reverse-continue, reverse-finish)
- Deterministic replay — the same recording always produces the same execution
- Low recording overhead (~1.3x slowdown, suitable for production use)
- Watchpoints that work in reverse (find when a variable last changed)
- Chaos mode for finding race conditions
- Seamless integration with gdb (gdb frontend)
- Shared memory recording for multi-process applications
- Syscall buffer for high-throughput I/O workloads
Strengths: Reverse debugging is a game-changer for complex bugs. Deterministic replay means you can share recordings with teammates. Low overhead makes it viable for CI/CD pipelines and production captures. Chaos mode finds timing-dependent bugs reliably.
Limitations: Only works on x86-64 Linux. Requires CPU performance counters (may conflict with profiling tools). Recording overhead, while low, may still affect timing-sensitive applications. Not all CPU features are supported (some AVX-512 variants).
Comparison Table
| Feature | gdb | lldb | rr |
|---|---|---|---|
| Architecture | Monolithic C | Modular C++/LLVM | Record-replay engine |
| Reverse Debugging | Limited (process record) | Via rr integration | Full (native) |
| Python Scripting | Extensive API | Superior SB API | Via gdb frontend |
| Core Dump Analysis | Mature, comprehensive | Growing support | N/A (needs recording) |
| Live Process Attach | Yes | Yes | Record first, then debug |
| Production Overhead | Zero (when not attached) | Zero (when not attached) | ~1.3x (during recording) |
| Remote Debugging | gdbserver (mature) | lldb-server (good) | File-based |
| C++ Support | Good (GCC-focused) | Excellent (Clang/LLVM) | Inherits from frontend |
| Recording Portability | N/A | N/A | Recordings are portable |
| Best For | Core dumps, embedded, legacy | IDE integration, scripting | Complex/heisenbugs, intermittent failures |
Practical Server Debugging Workflows
Debugging a Running Service with gdb
| |
Core Dump Analysis with gdb
| |
rr Recording for Intermittent Failures
The power of rr shines when debugging crashes that happen once every thousand requests:
| |
Automated Bug Detection with rr Chaos Mode
rr’s chaos mode randomly perturbs scheduling to expose race conditions:
| |
This technique finds concurrency bugs that might otherwise go undetected for months in production.
Why Self-Host Your Debugging Infrastructure?
Running debugging tools on your own servers — rather than relying on cloud-based debugging services — offers several critical advantages for security-conscious organizations:
Data sovereignty: Core dumps and process recordings contain sensitive data — memory contents, encryption keys, user data, and proprietary algorithms. Keeping all debugging data local ensures compliance with data protection regulations (GDPR, HIPAA, SOC 2).
No external dependencies: When a production outage is costing thousands per minute, you cannot afford to wait for a cloud debugging service to become available. Self-hosted tools work even during network partitions or cloud provider outages.
Custom automation: Python scripting in gdb and lldb enables custom debugging workflows tailored to your specific application architecture. Automate repetitive investigation steps, generate structured reports, and integrate with your monitoring stack.
Zero-cost scaling: Debug as many processes as you need — there are no per-seat licenses, API rate limits, or usage quotas when using open-source debugging tools.
For complementary server diagnostics, see our network diagnostics tools guide and Linux network interface diagnostics.
FAQ
Can I use gdb and lldb interchangeably?
Not entirely. While both debug ELF binaries and support similar commands, their scripting APIs and advanced features differ significantly. gdb scripts won’t run in lldb without porting. For most basic debugging tasks (setting breakpoints, inspecting variables, backtraces), either tool works fine. Choose gdb if you need mature core dump support or remote embedded debugging. Choose lldb if you need superior Python scripting or C++ expression evaluation.
Does rr work with Docker containers?
Yes, with some configuration. rr requires access to CPU performance counters, which Docker blocks by default. Enable them with --cap-add=SYS_PTRACE --security-opt seccomp=unconfined. For Kubernetes, you’ll need a privileged container or a custom security context. Many teams run rr recordings inside the same container image as their production service to ensure binary compatibility.
How do I debug a multi-threaded deadlock?
rr is the most effective tool for deadlock debugging. Record the execution once, then replay and set breakpoints at each locking function. Use rr replay with gdb to step through the lock acquisition order in both threads. With reverse execution, you can trace backward from the deadlock point to see exactly which thread acquired which lock and in what order — something that’s nearly impossible with live debugging alone.
What’s the best way to debug a crashing daemon that only fails under load?
Use rr to record the daemon under load until it crashes. Since rr recordings are deterministic, you can replay the crash as many times as needed. Alternatively, configure systemd to capture core dumps: set LimitCORE=infinity in the service unit file and DefaultLimitCORE=infinity in /etc/systemd/system.conf. Core dumps capture the exact state at crash time, allowing offline analysis with gdb.
How do I debug a service without stopping it?
gdb and lldb both support non-stop mode where you can inspect a running process without freezing all threads. Use set non-stop on in gdb or settings set target.non-stop-mode true in lldb. This is critical for debugging production services where stopping all threads would trigger health check failures or client timeouts. Note that non-stop mode has limitations — you cannot reliably examine data structures that are being concurrently modified by other threads.
Is there a web-based interface for Linux debugging?
While gdb and lldb are fundamentally CLI tools, several projects add web interfaces. gdb’s built-in TUI mode (gdb -tui) provides a terminal-based split-screen view with source code and command input. For remote collaboration, you can combine gdb with tty-share or tmux for shared debugging sessions. For automated analysis, gdb’s Machine Interface (gdb -i=mi) provides structured JSON output that can power custom web dashboards.
💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com