Self-Hosted Observability Data Pipelines: Vector vs Fluentd vs Fluent Bit

Introduction

Modern observability stacks ingest data from hundreds of sources — application logs, system metrics, distributed traces, Kubernetes events, and security audit streams. Before this data reaches your storage backends (Loki, Elasticsearch, ClickHouse, or S3), it needs to be collected, parsed, filtered, enriched, and routed. This is where observability data pipelines come in — they act as the central nervous system between telemetry producers and consumers.

Three open-source projects dominate this space: Vector (written in Rust, focused on performance), Fluentd (the established CNCF-graduated workhorse), and Fluent Bit (the lightweight C-based sibling designed for edge and containerized environments). Each brings a distinct philosophy to the same fundamental challenge.

Comparison Table

Feature	Vector	Fluentd	Fluent Bit
GitHub Stars	22,053	13,547	7,932
Language	Rust	Ruby (core) / C	C
License	MPL 2.0	Apache 2.0	Apache 2.0
Memory Usage (idle)	~15 MB	~40-60 MB	~5-10 MB
CPU Efficiency	Excellent (Rust, async)	Moderate (Ruby GIL)	Excellent (C, event-driven)
Throughput	100+ TB/day (single instance)	~10-20 TB/day	~5-10 TB/day
Built-in Sources	30+	100+ (via plugins)	20+ (extendable)
Built-in Sinks	40+	100+ (via plugins)	25+ (extendable)
Transform Language	VRL (Vector Remap Language)	Filter plugins + Ruby DSL	Lua scripting + filters
Hot Reload	Native	Via signal	Native
Kubernetes Native	Helm + Operator	Helm + fluentd-kubernetes	Helm + DaemonSet
CNCF Status	Not graduated	Graduated	Graduated
Last Updated	June 2026	June 2026	June 2026

Vector: Performance-First Pipeline Architecture

Vector, developed by Datadog (formerly Timber.io), represents the new generation of observability pipelines. Written entirely in Rust with an async, topology-based architecture, Vector delivers exceptional throughput with minimal resource consumption. Its standout feature is VRL (Vector Remap Language) — a purpose-built expression language for parsing, transforming, and enriching observability data without the overhead of embedded scripting runtimes.

Key strengths: Vector’s performance ceiling is dramatically higher than Ruby-based alternatives. A single Vector instance can process over 100 TB/day, making it suitable for high-volume centralized aggregation. VRL provides type-safe, expression-based transforms that are both fast and auditable — no arbitrary code execution, just declarative remapping.

Docker Compose (aggregator pattern):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# vector-aggregator.yml
version: "3.8"
services:
  vector:
    image: timberio/vector:0.45.0-debian
    ports:
      - "9000:9000"
      - "8686:8686"
    volumes:
      - ./vector.toml:/etc/vector/vector.toml:ro
      - vector-data:/var/lib/vector
    environment:
      - VECTOR_REQUIRE_HEALTHY=true
      - ENVIRONMENT=production

volumes:
  vector-data:

Vector config (aggregates from Fluent Bit agents):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# vector.toml
[sources.fluent]
type = "fluent"
address = "0.0.0.0:24224"

[transforms.parse_json]
type = "remap"
inputs = ["fluent"]
source = '''
  . = parse_json!(.message) ?? .
  .environment = get_env_var!("ENVIRONMENT")
  .processed_at = now()
'''

[sinks.loki]
type = "loki"
inputs = ["parse_json"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels = { source = "vector", env = "{{ environment }}" }

[sinks.s3_archive]
type = "aws_s3"
inputs = ["parse_json"]
bucket = "logs-archive"
region = "us-east-1"
compression = "gzip"

Fluentd: The Battle-Tested Unified Logging Layer

Fluentd is a CNCF-graduated project and the most mature observability pipeline in the ecosystem. With over 1,000 community-contributed plugins, Fluentd connects to virtually any data source or destination. Its plugin architecture — input, parser, filter, buffer, and output — provides a battle-tested pattern that inspired both Vector and Fluent Bit.

Key strengths: Fluentd’s plugin ecosystem is unmatched. If you need to ingest from a legacy syslog format, an obscure database, or a proprietary API, there is likely a Fluentd plugin for it. Its buffering system (file or memory-based) ensures zero data loss during downstream outages. The Kubernetes metadata filter plugin enriches logs with pod, namespace, and label metadata automatically.

Docker Compose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# fluentd-stack.yml
version: "3.8"
services:
  fluentd:
    image: fluent/fluentd:v1.18-debian-1
    ports:
      - "24224:24224"
      - "24224:24224/udp"
    volumes:
      - ./fluentd.conf:/fluentd/etc/fluent.conf:ro
      - fluentd-buffer:/fluentd/buffer
    environment:
      - FLUENTD_CONF=fluent.conf

volumes:
  fluentd-buffer:

Fluentd config:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<filter **>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
    collector fluentd
  </record>
</filter>

<match **>
  @type copy
  <store>
    @type elasticsearch
    host elasticsearch
    port 9200
    logstash_format true
  </store>
  <store>
    @type s3
    s3_bucket logs-archive
    s3_region us-east-1
    path logs/%Y/%m/%d/
    <buffer time>
      @type file
      path /fluentd/buffer/s3
      timekey 3600
      timekey_wait 10m
    </buffer>
  </store>
</match>

Fluent Bit: Lightweight Edge Collection

Fluent Bit, also a CNCF-graduated project, was created by the Fluentd team as a lightweight, C-based alternative for edge and embedded environments. Where Fluentd targets aggregation and centralized processing, Fluent Bit excels at the collection layer — running as a DaemonSet on every Kubernetes node, collecting container logs, and forwarding them upstream.

Key strengths: Fluent Bit’s memory footprint (~5-10 MB idle) makes it the only viable option for resource-constrained environments like IoT devices, edge gateways, and large-scale Kubernetes node agents. It maintains compatibility with Fluentd’s forward protocol and output plugins, allowing seamless Fluent Bit-to-Vector or Fluent Bit-to-Fluentd aggregation topologies.

Docker Compose (edge agent with Loki):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# fluent-bit-agent.yml
version: "3.8"
services:
  fluent-bit:
    image: fluent/fluent-bit:3.2.0
    ports:
      - "2020:2020"
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro

  loki:
    image: grafana/loki:3.2.0
    ports:
      - "3100:3100"

Fluent Bit config (Kubernetes DaemonSet pattern):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name             tail
    Path             /var/log/containers/*.log
    Parser           docker
    Tag              kube.*
    Refresh_Interval 5
    Mem_Buf_Limit    50MB

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Merge_Log           On

[OUTPUT]
    Name        forward
    Match       *
    Host        vector-aggregator
    Port        24224

Architecture Patterns

Tiered Collection: Fluent Bit (edge) → Vector (aggregator) → Loki/S3. Fluent Bit handles per-node log tailing efficiently; Vector provides heavy-duty transformation and multi-destination routing. This pattern combines Fluent Bit’s minimal node footprint with Vector’s VRL-based processing power — the best of both worlds.

Direct-to-Backend: Vector → ClickHouse/Loki/Elasticsearch. For smaller deployments where a dedicated aggregator layer is unnecessary overhead, Vector (or Fluentd) can write directly to storage backends with built-in buffering for reliability.

Hybrid Fluentd Pipeline: Fluent Bit (edge) → Fluentd (aggregator) → S3 + Elasticsearch. The traditional pattern where Fluentd’s plugin ecosystem handles integration with diverse backends.

Performance Benchmarking and Scaling

When selecting an observability pipeline, throughput and resource efficiency at scale are critical considerations. Independent benchmarks have shown substantial differences between these tools, primarily driven by language runtime characteristics.

Vector’s Rust implementation with async I/O consistently achieves 5-10x higher throughput per CPU core compared to Fluentd. In a typical benchmark processing 100 GB of JSON logs with 20 transform operations, Vector achieves approximately 1.2 TB/hour on a 4-core machine, while Fluentd processes 120-180 GB/hour on equivalent hardware. Fluent Bit, despite its C implementation, processes 300-500 GB/hour — faster than Fluentd but significantly slower than Vector due to its single-threaded event loop architecture and the overhead of its Lua scripting engine for complex transforms.

For production deployments, understanding scaling patterns is essential. Vector scales linearly with CPU cores due to its topology-based, multi-threaded architecture — you can add cores and get proportional throughput increases. Fluentd’s scaling is constrained by Ruby’s Global Interpreter Lock (GIL), limiting single-process throughput regardless of core count; scaling Fluentd means running multiple worker processes, each with its own memory overhead. Fluent Bit’s event-driven C implementation is efficient per-core but doesn’t benefit from many cores due to single-threaded design — scaling means running multiple Fluent Bit instances with load-balanced inputs.

A practical sizing guideline: for 100 Kubernetes nodes generating ~200 GB of logs per day, a single Vector instance (2 CPU, 512 MB RAM) handles the entire load comfortably. The equivalent Fluentd deployment needs 3-4 worker processes (4 CPU, 2 GB RAM total), while Fluent Bit can handle collection with just 128 MB RAM per node agent. For related observability infrastructure, see our self-hosted observability platforms guide.

FAQ

Should I replace Fluentd with Vector?

It depends on your priorities. If plugin ecosystem breadth matters most (obscure inputs/outputs), Fluentd’s 1,000+ plugins are hard to beat. If performance and resource efficiency are your primary concerns, Vector’s Rust-based architecture delivers 5-10x higher throughput per instance. Many teams adopt Vector for aggregation and keep Fluent Bit for edge collection — the tools are complementary, not mutually exclusive.

How does VRL compare to Fluentd’s filter plugins?

VRL (Vector Remap Language) is a purpose-built, expression-based language that compiles to efficient bytecode. Fluentd filters use Ruby — flexible but less performant due to the Ruby GIL. VRL’s key advantage is predictable performance and type safety; you cannot accidentally execute arbitrary Ruby code in a transform. VRL also has an interactive playground for testing expressions before deployment.

Can I run Fluent Bit without Fluentd?

Absolutely. Fluent Bit is a fully independent project that can write directly to over 25 output destinations including Elasticsearch, Loki, S3, Kafka, and HTTP endpoints. It does not require Fluentd as an intermediary. Many Kubernetes deployments use Fluent Bit as a standalone DaemonSet writing directly to Loki or Elasticsearch.

Which is best for Kubernetes log collection?

Fluent Bit is the de facto standard for Kubernetes node-level log collection. It runs as a DaemonSet with native Kubernetes metadata enrichment and has the lowest resource footprint. Cloud providers including AWS (Container Insights), GCP, and Azure all ship Fluent Bit-based logging solutions. For log forwarding configurations, see our log forwarding tools comparison.

Does Vector support Windows?

Yes. Vector has first-class Windows support with native MSI installers and Windows Event Log as a built-in source. Fluentd and Fluent Bit also support Windows. If you are running a mixed Linux/Windows Kubernetes cluster, Vector provides the most consistent experience across platforms. For centralized log management, check our Graylog and Loki comparison.

💰 想测试你的市场判断力？我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台，从大选结果到技术监管时间线，什么都可以押注。和赌博不同，这是真正的信息市场：你懂的信息越多，胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册：Polymarket.com