Every distributed system generates logs — application output, access logs, error traces, audit events, and metrics. Getting those logs from dozens or hundreds of services into a central location for search, alerting, and analysis is one of the most fundamental infrastructure challenges. That is where log shippers come in.
Log shippers (also called log forwarders or log agents) collect logs from various sources, transform and enrich them, and ship them to a central destination like Elasticsearch, Loki, OpenSearch, or a cloud storage bucket. Running this pipeline yourself means full control over what data leaves your servers, how it is processed, and where it ends up.
In this guide, we compare the three most widely-used open-source log shippers: Vector by Datadog, Fluent Bit by CNCF, and Logstash by Elastic. We cover installation, configuration, performance characteristics, and the architecture patterns that make each tool shine.
Why Self-Host Log Shipping?
Log data is often the most sensitive telemetry you collect. It contains user identifiers, internal IP addresses, application secrets that accidentally leak to stderr, database queries, and the full topology of your infrastructure. Self-hosting your log pipeline gives you several advantages:
Data sovereignty. Logs never traverse third-party networks. For regulated industries (healthcare, finance, government), keeping log data within your own infrastructure is often a compliance requirement.
Cost control. Commercial log ingestion services charge per gigabyte ingested. At scale, this can become the single largest line item in your observability budget. Self-hosted shippers let you send logs to your own storage — S3-compatible buckets, local filesystems, or self-hosted Elasticsearch clusters — without per-GB fees.
Custom transformation. Every team has unique log formats. Self-hosted shippers let you write custom parsing rules, redact sensitive fields, add metadata, and reshape events before they reach your storage layer.
Resilience. When your central logging backend goes down, a self-hosted shipper can buffer logs locally and retry delivery, preventing data loss during outages.
No vendor lock-in. Your log pipeline configuration is portable. Switch your backend from Elasticsearch to Loki to OpenSearch without changing your collection layer.
Vector: Best for Performance and Reliability
Vector is a high-performance observability data pipeline built in Rust by Datadog. It collects, transforms, and routes logs, metrics, and traces with a focus on reliability, speed, and memory safety. Vector has become the go-to choice for teams that need to process large volumes of telemetry with minimal resource overhead.
Key Features
- Built in Rust — memory-safe, zero garbage collection pauses, minimal CPU overhead
- At-least-once delivery — disk-buffered queues ensure no data is lost during restarts or network failures
- Built-in transforms — remap language for parsing, redacting, enriching, and reshaping events
- Multi-format support — collects logs, metrics, and traces in a single agent
- Topology awareness — understand exactly how data flows through your pipeline with the
vector topcommand - Hot-reloadable configuration — change pipelines without restarting the agent
Installation with docker Compose
Run Vector as a daemon on each node, collecting logs from local files, Docker containers, or systemd journal:
| |
Configuration Example
Vector uses TOML configuration. Here is a production setup that collects Docker container logs, parses JSON output, redacts sensitive fields, and ships to Loki:
| |
Start the agent:
| |
Apply with kubectl apply -f vector-k8s-daemonset.yaml.
Fluent Bit: Best for Lightweight Edge Collection
Fluent Bit is a fast and lightweight log processor and forwarder built in C. It is a CNCF graduated project and the de facto standard for edge log collection, especially in resource-constrained environments. Fluent Bit is designed to run with minimal memory (often under 10 MB) and low CPU usage.
Key Features
- Written in C — extremely low memory footprint and CPU usage
- CNCF graduated project — production-hardened, widely adopted in Kubernetes
- Rich plugin ecosystem — 100+ input, filter, and output plugins
- Native Kubernetes integration — automatic pod and namespace metadata enrichment
- Stream processing — SQL-like queries for real-time log analysis
- Multi-tenant routing — route different log streams to different backends
Installation with Docker Compose
| |
Configuration Example
Fluent Bit uses an INI-style configuration with sections for inputs, filters, and outputs:
| |
| |
Start the collector:
| |
Fluent Bit Stream Processing
Fluent Bit includes a stream processing engine that lets you run SQL-like queries on log streams in real time:
| |
Logstash: Best for Complex Transformation Pipelines
Logstash is the oldest and most feature-rich log shipper in the Elastic ecosystem. Written in Java with JRuby for its plugin system, Logstash excels at complex data transformation, enrichment, and multi-stage processing pipelines. If your log processing requirements involve heavy parsing, external API lookups, geo-IP enrichment, or multi-step data manipulation, Logstash is the most capable option.
Key Features
- 200+ plugins — the largest plugin ecosystem of any log shipper
- Powerful filter pipeline — grok parsing, mutate, geoip, dns, ruby, and more
- JRuby plugin system — write custom filters in Ruby for unlimited flexibility
- Deep Elastic integration — native Elasticsearch, Beats, and Kibana support
- Dead letter queue — captures events that fail processing for later analysis
- Persistent queues — disk-based queues for reliable delivery
Installation with Docker Compose
| |
Configuration Example
Logstash configuration uses a three-stage pipeline: input, filter, and output:
| |
| |
Start Logstash:
| |
Using the Dead Letter Queue
Logstash can capture events that fail processing — useful for debugging pipeline issues:
| |
Failed events are stored and can be consumed by a separate Logstash pipeline for analysis:
| |
Head-to-Head Comparison
| Feature | Vector | Fluent Bit | Logstash |
|---|---|---|---|
| Language | Rust | C | Java (JRuby plugins) |
| Memory usage | 50-150 MB | 5-30 MB | 500 MB - 2 GB |
| CPU overhead | Very low | Extremely low | Moderate to high |
| Throughput | 500K+ events/sec | 200K+ events/sec | 50K-100K events/sec |
| Guaranteed delivery | ✅ Disk-buffered queues | ⚠️ Limited buffering | ✅ Persistent queues |
| Log collection | ✅ Files, Docker, K8s, syslog, TCP/UDP, HTTP | ✅ Files, Docker, K8s, syslog, TCP/UDP, HTTP | ✅ Files, Beats, syslog, TCP/UDP, HTTP, JDBC |
| Transformation | VRL (Remap language) | Lua, Stream Processor, grep/nest/modify | Grok, mutate, geoip, Ruby, 200+ filters |
| Multi-format | ✅ Logs + metrics + traces | ✅ Logs + metrics + traces | ✅ Logs + metrics |
| Output destinations | 40+ (Loki, ES, S3, Kafka, HTTP, etc.) | 60+ (Loki, ES, S3, Kafka, HTTP, etc.) | 100+ (ES, Loki, S3, Kafka, HTTP, etc.) |
| Kubernetes native | ✅ DaemonSet with K8s source | ✅ CNCF graduated, K8s filter | ⚠️ Requires Beats or Filebeat |
| Configuration | TOML (declarative) | INI-style (sections) | Ruby DSL (pipeline) |
| Hot reload | ✅ Config changes without restart | ⚠️ Limited | ⚠️ Pipeline reload (some plugins) |
| Metrics endpoint | ✅ Built-in Prometheus metrics | ✅ HTTP monitoring | ✅ HTTP + JMX |
| Plugin ecosystem | Growing (Rust-based) | Large (C-based) | Largest (JRuby-based) |
| Best for | High-throughput, low-resource pipelines | Edge collection, IoT, containers | Complex transformation, Elastic stack |
Choosing the Right Log Shipper
Choose Vector if:
- You need the highest throughput with the lowest resource usage
- You want guaranteed delivery with disk-buffered queues
- You value memory safety and zero garbage collection pauses
- You need a unified pipeline for logs, metrics, and traces
- You want a modern, actively developed tool with excellent ergonomics
Choose Fluent Bit if:
- You are running in resource-constrained environments (edge, IoT, small VMs)
- You need the smallest possible memory footprint
- You want a CNCF graduated project with massive community adoption
- You run Kubernetes and want the standard DaemonSet log collector
- You need stream processing with SQL-like queries on log data
Choose Logstash if:
- You need complex, multi-stage transformation pipelines
- You already run the Elastic Stack (Elasticsearch, Kibana, Beats)
- You need geo-IP enrichment, DNS lookups, or external API calls during processing
- You want the largest plugin ecosystem with 200+ inputs, filters, and outputs
- You need the dead letter queue for debugging failed events
Production Architecture: Combining All Three
In large-scale deployments, the best architecture often combines multiple shippers at different layers:
| |
In this architecture:
- Fluent Bit runs on every node, collecting logs with minimal overhead
- Vector aggregates and routes logs from multiple nodes, handling retries and buffering
- Logstash performs heavy transformation, enrichment, and multi-output routing
- Logs reach Loki for fast search, S3 for long-term archival, and Elasticsearch for deep analysis
This layered approach gives you the lightweight collection of Fluent Bit, the reliability of Vector, and the transformation power of Logstash — each doing what it does best.
Monitoring Your Log Pipeline
Whichever shipper you choose, monitor the pipeline itself:
| |
Set up alerts for buffer fullness, error rates, and output lag:
| |
Conclusion
Self-hosted log shipping is the foundation of any serious observability stack. Vector leads on performance and reliability with its Rust implementation and guaranteed delivery semantics. Fluent Bit dominates the edge collection space with its tiny footprint and CNCF pedigree. Logstash remains unmatched for complex transformation pipelines with its vast plugin ecosystem and deep Elastic Stack integration.
Start with the tool that matches your constraints: Fluent Bit for resource-limited environments, Vector for high-throughput production systems, or Logstash for heavy data transformation needs. As your infrastructure grows, combine them in a layered architecture where each handles the workload it was designed for. Your logs are too important to lose — build a pipeline you can trust.
Frequently Asked Questions (FAQ)
Which one should I choose in 2026?
The best choice depends on your specific requirements:
- For beginners: Start with the simplest option that covers your core use case
- For production: Choose the solution with the most active community and documentation
- For teams: Look for collaboration features and user management
- For privacy: Prefer fully open-source, self-hosted options with no telemetry
Refer to the comparison table above for detailed feature breakdowns.
Can I migrate between these tools?
Most tools support data import/export. Always:
- Backup your current data
- Test the migration on a staging environment
- Check official migration guides in the documentation
Are there free versions available?
All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.
How do I get started?
- Review the comparison table to identify your requirements
- Visit the official documentation (links provided above)
- Start with a Docker Compose setup for easy testing
- Join the community forums for troubleshooting