Network flow data is the backbone of traffic analysis, capacity planning, and security monitoring. Flow protocols like NetFlow (Cisco), sFlow (sampling-based), and IPFIX (IETF standard) export metadata about network traffic — source/destination IPs, ports, protocols, byte counts — without capturing full packet payloads. This makes flow data lightweight, scalable, and privacy-friendly.
In this guide, we compare three open-source flow collection tools: GoFlow2 (high-performance collector), softflowd (flow exporter probe), and nfdump (NetFlow processing toolkit). Each plays a different role in the flow collection pipeline.
Comparison Table
| Feature | GoFlow2 | softflowd | nfdump |
|---|---|---|---|
| GitHub Stars | 763+ | 209+ | 600+ |
| Role | Flow collector | Flow exporter (probe) | Flow processor/analyzer |
| Supported Protocols | sFlow v5, NetFlow v5/v9, IPFIX | NetFlow v5/v9, IPFIX | NetFlow v5/v9, IPFIX |
| Architecture | Go, concurrent pipeline | C, single-process daemon | C, file-based processing |
| Output Format | Protobuf, Kafka, stdout | NetFlow to collector | NFDUMP binary format |
| Storage Backend | External (Kafka, file) | N/A (sends to collector) | Local file (nfcapd) |
| Scalability | High (multi-goroutine) | Single interface | File-based, batch processing |
| Docker Support | Official image available | Community Dockerfiles | Official packages |
| License | Apache 2.0 | BSD | BSD |
| Best For | Centralized collection pipeline | Router/switch flow export | Offline flow analysis |
GoFlow2
GoFlow2 is a high-performance, concurrent flow collector written in Go by the team at Netsampler. It’s designed to ingest massive volumes of sFlow, NetFlow v5/v9, and IPFIX data and output structured data (Protobuf) for downstream processing.
Key Features
- Multi-Protocol Support: Ingests sFlow v5, NetFlow v5, NetFlow v9, and IPFIX simultaneously on the same port.
- Concurrent Pipeline: Go’s goroutine-based architecture enables parallel decoding and processing of flow records.
- Protobuf Output: Converts raw flow data to structured Protobuf messages, compatible with Kafka, file output, or stdout.
- Prometheus Metrics: Built-in metrics for monitoring collector health, ingestion rates, and error counts.
Docker Compose Setup
| |
Sending Flow Data to Kafka
| |
softflowd
softflowd is a flow-based network traffic analyzer that generates NetFlow records from captured packets. Unlike GoFlow2 (which collects flows from network devices), softflowd acts as a flow exporter probe — it captures packets on a network interface and exports them as NetFlow to a collector.
Key Features
- Packet Capture to Flow Conversion: Captures packets using pcap and generates Cisco-compatible NetFlow v5/v9 or IPFIX records.
- Interface Flexibility: Monitors any network interface, including virtual interfaces and bridges.
- Low Resource Usage: Written in C with minimal memory footprint — runs efficiently on routers and small appliances.
- Template-Based Export: NetFlow v9 and IPFIX support dynamic templates for custom field definitions.
Installation and Usage
| |
Docker Deployment
| |
nfdump
nfdump is a comprehensive NetFlow processing toolkit consisting of two main components: nfcapd (NetFlow capture daemon) and nfdump (flow analysis tool). It stores flow data in an optimized binary format and provides powerful query capabilities.
Key Features
- High-Speed Capture: nfcapd captures NetFlow v5/v9 and IPFIX at line rate with minimal CPU overhead.
- Efficient Binary Storage: Flow records are stored in compact binary files (nfcapd files), enabling fast queries.
- Powerful Filtering: SQL-like filter expressions for querying flow data by IP, port, protocol, time range, and more.
- Flow Aggregation: Aggregate flows by various dimensions (IP pairs, AS numbers, protocols) for traffic analysis.
Installation and Setup
| |
Querying Flow Data
| |
Complete Flow Collection Pipeline
| |
Why Network Flow Collection is Essential for Self-Hosted Infrastructure
Network flow data provides visibility into traffic patterns without the overhead and privacy concerns of full packet capture. For self-hosted infrastructure operators, flow collection is the most practical way to understand bandwidth usage, detect anomalies, and plan capacity upgrades.
The Value of Flow Data
Flow records capture the “five-tuple” of network communication — source IP, destination IP, source port, destination port, and protocol — along with byte and packet counts. This metadata tells you:
- Who is talking to whom: Identify top talkers, unusual connections, and data transfer patterns.
- What services are consuming bandwidth: See which applications (HTTP, DNS, database queries) dominate your network.
- When traffic spikes occur: Correlate flow volume with time-of-day to plan capacity and identify off-hours anomalies.
- Where bottlenecks exist: Pinpoint congested links, overloaded servers, and misconfigured routing.
Flow Protocols Explained
NetFlow was developed by Cisco and exports aggregated flow records from network devices. Version 5 supports IPv4 only; version 9 and IPFIX support IPv6, MPLS, and custom field definitions. sFlow takes a different approach — it samples packets at a configurable rate (e.g., 1 in 1000), making it more suitable for high-speed links where full flow export would overwhelm the collector. IPFIX (RFC 7011) is the IETF standard that extends NetFlow v9 with extensible field definitions and is supported by most modern networking equipment.
Self-Hosting vs Managed Flow Solutions
Commercial flow analysis platforms (SolarWinds NTA, Plixer Scrutinizer, ManageEngine NetFlow Analyzer) offer turnkey dashboards but require expensive licenses and send your network metadata to vendor cloud services. Self-hosted flow collection keeps all data on your infrastructure, supports unlimited data retention, and integrates with your existing monitoring stack (Prometheus, Grafana, Elasticsearch). The tools covered in this article — GoFlow2, softflowd, and nfdump — represent the full flow collection pipeline from capture to analysis.
Building a Complete Flow Collection Architecture
A production flow monitoring system typically combines these tools:
- softflowd runs on each router, switch, or server to capture local traffic and export NetFlow records
- GoFlow2 acts as the central collector, ingesting flows from dozens of probes, converting to Protobuf, and forwarding to Kafka
- nfdump processes stored flow files for historical analysis, reporting, and security investigations
| |
This architecture gives you real-time flow ingestion (GoFlow2), distributed packet capture (softflowd), and offline analysis capabilities (nfdump).
For related network monitoring topics, see our network flow analysis with pmacct, nfdump, and fprobe and network diagnostics with fping, mtr, and nmap.
FAQ
What is the difference between NetFlow, sFlow, and IPFIX?
NetFlow (Cisco proprietary, now v5/v9) exports flow records based on traffic observed at a network interface. sFlow (standardized, v5) uses packet sampling — it captures a statistical sample of packets rather than every flow, making it more scalable for high-speed links. IPFIX (IETF RFC 7011) is the standardized version of NetFlow v9, supporting extensible field definitions. GoFlow2 supports all three; softflowd exports NetFlow v9 and IPFIX; nfdump collects and processes all three.
Can softflowd run on production routers?
Yes. softflowd has a minimal resource footprint (a few MB of RAM, negligible CPU) and is commonly deployed on OpenWRT routers, pfSense firewalls, and Linux-based network appliances. However, on very high-throughput links (>1 Gbps), sampling-based approaches like sFlow may be more efficient than full packet capture.
How much disk space does nfdump require?
Flow data is extremely compact compared to full packet captures. A typical enterprise network generates 100 MB to 1 GB of nfdump data per day, depending on traffic volume and flow timeout settings. With compression enabled (-w flag), storage requirements are reduced by 30-50%. A 30-day retention policy typically requires 3-30 GB of disk space.
Is GoFlow2 production-ready for large-scale deployments?
GoFlow2 is used in production by several ISPs and cloud providers. Its goroutine-based architecture handles hundreds of thousands of flow records per second. The key to scaling is pairing it with a robust downstream pipeline — Kafka for buffering, and a time-series database or data lake for storage. The Prometheus metrics endpoint enables monitoring collector health in real time.
How do I troubleshoot missing flow data?
Common causes include: (1) Firewall rules blocking UDP ports 2055/6343 — verify with tcpdump -i any udp port 2055. (2) Probe misconfiguration — ensure softflowd or your router is pointing to the correct collector IP and port. (3) Version mismatch — NetFlow v5 uses a different packet format than v9/IPFIX; verify your collector supports the protocol version. (4) Sampling rate — if using sFlow, a high sampling ratio (e.g., 1:10000) means you’ll only see 0.01% of traffic.
Can I use these tools for security monitoring?
Absolutely. Flow data is a cornerstone of network security operations. nfdump queries can detect port scanning (many unique destination ports from a single source), DDoS attacks (sudden traffic volume spikes), data exfiltration (unusual outbound traffic volumes), and command-and-control communication (periodic connections to known-bad IPs). Combined with threat intelligence feeds, flow analysis provides network visibility without the privacy concerns of full packet capture.