Every infrastructure team eventually faces the same question: can our services handle what’s coming? Whether you’re launching a new feature, preparing for a seasonal traffic spike, or validating architecture changes, you need real load data. Commercial platforms charge per virtual user, per test hour, or per report. The open-source alternatives cost nothing, run on your own hardware, and give you full control over every request.
This guide covers the best self-hosted load generation and traffic replay tools available in 2026: GoReplay for production traffic replay, Siege for simple concurrent load testing, wrk for high-performance HTTP benchmarking, Vegeta for constant-rate load generation, and k6 for developer-friendly scripting. You’ll learn when to use each tool, how to install and configure them, and how to interpret results.
Why Self-Host Your Load Testing Infrastructure
Running load tests against staging environments with cloud-based services introduces several problems that self-hosted tools eliminate:
- Data privacy: Cloud load testing services send your actual URLs, headers, and request payloads through their infrastructure. For internal APIs, healthcare endpoints, or financial services, this is often a compliance violation. Self-hosted tools keep all traffic within your network perimeter.
- Network realism: Cloud-based generators test your application from a single external IP and network path. Self-hosted tools let you generate traffic from the same network topology your real users experience — same latency, same NAT, same egress points.
- Cost at scale: A 24-hour soak test with 10,000 virtual users can cost hundreds of dollars on commercial platforms. Running the same test on a spare VM or a few containers costs the price of electricity.
- No rate limits or quotas: When you’re iterating on a fix, you want to run tests back-to-back. Cloud services throttle you; self-hosted tools don’t.
- Custom protocols: Most cloud platforms only support HTTP and HTTPS. Self-hosted tools like GoReplay and Vegeta can replay any TCP-based protocol, including WebSocket connections, gRPC calls, and raw socket traffic.
- Integration with internal monitoring: When your load generator runs inside the same datacenter, you can correlate test results with internal metrics from prometheus, Grafana, or your APM system without cross-network noise.
Understanding the Different Approaches
Load generation tools fall into three categories, and choosing the right one depends on what you’re trying to validate:
| Category | Tools | Best For |
|---|---|---|
| Traffic Replay | GoReplay, tcpreplay | Capturing and replaying real production traffic patterns |
| HTTP Benchmarking | wrk, Siege, ab (ApacheBench) | Measuring raw throughput and latency under controlled conditions |
| Programmable Load | Vegeta, k6, Locust | Building custom test scenarios with variable rates, custom headers, and assertions |
The fundamental difference is that traffic replay tools don’t generate synthetic requests — they capture real ones and play them back. This means your tests automatically include the right mix of endpoints, request sizes, timing patterns, and header combinations that your actual users produce.
HTTP benchmarking tools are the fastest way to answer a simple question: how many requests per second can this endpoint handle? They’re ideal for comparing configurations, validating code changes, or establishing baseline numbers.
Programmable load tools give you the most control. You define the request rate, the payload mix, the ramp-up curve, and the pass/fail criteria in code. Use these when you need to test specific scenarios like “what happens when 80% of traffic hits the search endpoint simultaneously.”
GoReplay: Production Traffic Replay
GoReplay is the most powerful open-source tool for capturing live HTTP traffic and replaying it against a staging environment. It sits between your load balancer and backend servers, records requests and responses, and can replay them at any speed you choose.
How GoReplay Works
GoReplay captures HTTP traffic at the TCP level using libpcap. It reconstructs complete HTTP requests and responses, stores them in a binary format, and replays them with exact timing reproduction or at an accelerated/decelerated rate. Because it captures at the network level, it works with zero changes to your application code.
| |
Capturing Production Traffic
The simplest capture command records all HTTP traffic on port 80 and writes it to a file:
| |
This creates timestamped request files that you can replay later. For production systems, you’ll want to filter and limit what you capture:
| |
Key flags explained:
--input-raw-track-response: Captures response bodies alongside requests, useful for validating that the replay target produces the same results--http-allow-url: Whitelist URL patterns — only capture API calls, not static assets--http-allow-method: Filter by HTTP method — capture only writes if you’re testing a database migration--output-file-max-size-limit: Cap file size to avoid filling disk on busy systems
Replaying Traffic Against Staging
Once you have captured traffic, replaying it is straightforward:
| |
The --input-file-loop flag makes GoReplay cycle through the capture file continuously, which is useful for soak tests. --output-http-workers controls how many concurrent connections GoReplay uses when replaying — set this high enough to saturate your staging server.
Running GoReplay with docker
For containerized deployments, GoReplay runs as a sidecar:
| |
Note that GoReplay requires network_mode: host or the NET_ADMIN capability because it captures traffic at the raw socket level:
| |
Advanced GoReplay Patterns
Middleware for request modification: You can modify requests on the fly using a middleware HTTP server:
| |
Your middleware receives each request as JSON, can modify headers, paths, or bodies, and returns the modified request. This is useful for stripping authentication tokens, changing host headers, or injecting test data.
Comparing production and staging responses: GoReplay can split traffic to both production and staging, then compare responses:
| |
Siege: Simple Concurrent Load Testing
Siege is a veteran HTTP load testing tool that excels at one thing: simulating a specific number of concurrent users hitting a URL for a defined duration. Its simplicity makes it the fastest way to get a baseline performance number.
Installation
| |
Siege uses a URL file to define what it tests. Create a file called urls.txt:
| |
Running Tests
The basic syntax is siege [options] url_or_file. Here are the most useful patterns:
| |
Key flags:
| Flag | Meaning | Example |
|---|---|---|
-c N | Concurrent users | -c 100 |
-t N[S/M/H] | Duration | -t 5M (5 minutes) |
-r N | Repetitions per user | -r 10 |
-d N | Random delay between requests (seconds) | -d 2 |
-i | Internet simulation (random URLs from the file) | -i -f urls.txt |
-b | Benchmark mode (no delay between requests) | -b -c 100 |
-q | Quiet mode | -q |
Understanding Siege Output
| |
The most important metrics:
- Transaction rate: Sustained requests per second — your primary throughput number
- Response time: Average time per request — latency indicator
- Availability: Percentage of successful requests — anything below 99.5% indicates a problem
- Concurrency: Average number of simultaneous connections maintained
Sieging with POST Data
For testing write endpoints, create a POST body file and reference it:
| |
wrk: High-Performance HTTP Benchmarking
wrk is a modern HTTP benchmarking tool written in C with LuaJIT scripting support. It can generate significant load on a single multi-core CPU — routinely achieving 100,000+ requests per second on modest hardware. Its Lua scripting engine lets you customize requests, parse responses, and generate dynamic data.
Installation
| |
Basic Usage
| |
Understanding wrk Output
| |
The critical insight from wrk that other tools don’t show clearly: latency distribution. The Avg, Stdev, Max, and +/- Stdev columns tell you not just the average latency but how consistent it is. A low average with high standard deviation means your service is usually fast but occasionally stalls — a pattern that kills user experience.
Lua Scripting in wrk
wrk’s real power comes from Lua scripts. Create a file called api_test.lua:
| |
Run it with:
| |
For POST requests with dynamic JSON bodies:
| |
Run with:
| |
Vegeta: Constant-Rate Load Generation
Vegeta, written in Go, takes a fundamentally different approach: instead of generating as many requests as possible, it maintains a constant request rate. You tell it “100 requests per second” and it will sustain exactly that rate, regardless of how fast or slow the server responds. This makes Vegeta ideal for finding the exact breaking point of a system.
Installation
| |
Basic Usage
Vegeta uses stdin/stdout for its attack pipeline. You pipe targets into the attack command, then pipe results into report commands:
| |
Multi-Endpoint Attacks
Create a targets.txt file:
| |
The file format supports blank-line-separated request groups. Each group is a complete HTTP request with optional headers and body:
| |
The rate format supports flexible syntax: 100/1s (100 per second), 1000/1m (1000 per minute), or just 100 (defaults to per second).
Ramp-Up and Spike Testing
Vegeta can vary the rate over time using a rate file:
| |
Or use the built-in ramp pattern:
| |
Vegeta Report Output
| |
The percentile breakdown (50, 90, 95, 99) is Vegeta’s most valuable output. It tells you exactly how many users experience slow responses. If your p99 latency exceeds your SLA but your average is fine, you have a tail latency problem — and Vegeta is the tool that reveals it.
k6: Developer-Friendly Load Testing
k6 stands out for its JavaScript API, which makes writing load tests feel like writing unit tests. You define scenarios, set thresholds, and write assertions in familiar JavaScript syntax. It’s the best choice when you want load tests in your CI/CD pipeline.
Installation
| |
Writing Your First Test
Create a file called load_test.js:
| |
Run it:
| |
k6 Output and Thresholds
| |
The threshold system is k6’s killer feature for CI/CD integration. If any threshold fails, k6 exits with a non-zero code — your pipeline automatically catches performance regressions.
Running k6 with Docker and Docker Compose
| |
Running the test:
| |
k6 with Environment Variables for Flexibility
| |
| |
Tool Comparison: Which One Should You Use?
| Feature | GoReplay | Siege | wrk | Vegeta | k6 |
|---|---|---|---|---|---|
| Traffic type | Captured real traffic | Synthetic URLs | Synthetic URLs | Synthetic URLs | Scripted scenarios |
| Request rate control | Playback speed multiplier | Concurrent users | Max throughput | Constant rate | Stages + scenarios |
| Protocol support | HTTP, HTTPS, WebSocket, TCP | HTTP, HTTPS | HTTP, HTTPS | HTTP, HTTPS | HTTP, HTTPS, gRPC, WebSocket |
| Dynamic data | Real production data | Static URL file | Lua scripting | Target file | JavaScript scripting |
| Latency percentiles | Basic | Basic | Full distribution | Full distribution | Full distribution |
| CI/CD integration | Moderate | Basic | Basic | Moderate | Excellent |
| Learning curve | Medium | Low | Low-Medium | Low | Low-Medium |
| Best for | Realistic staging tests | Quick baselines | Raw benchmarking | Breaking point analysis | Automated pipeline tests |
Decision Guide
- You want to test with real user patterns: Use GoReplay. Nothing matches the authenticity of actual production traffic, including the messy edge cases and unexpected request combinations.
- You need a quick “how fast is this” number: Use wrk. It gives you the highest throughput measurement with minimal setup time.
- You want to find the exact breaking point: Use Vegeta. Its constant-rate model lets you increase load incrementally until the system fails, giving you a precise capacity number.
- You want load tests in your CI/CD pipeline: Use k6. Its threshold system, JavaScript API, and native JSON output integrate cleanly into automated workflows.
- You need to simulate N concurrent users hitting a known URL set: Use Siege. It’s the simplest tool for this specific job.
Real-World Testing Strategy
A production-ready load testing strategy combines multiple tools at different stages:
- Development: Run wrk benchmarks against local changes to catch obvious performance regressions before merging.
- Pre-release: Run GoReplay traffic replay against staging to validate that new code handles real production traffic patterns correctly.
- Capacity planning: Use Vegeta to determine the maximum sustainable request rate for each endpoint, establishing baseline capacity numbers.
- CI/CD: Run k6 tests with defined thresholds on every pull request to prevent performance regressions from reaching production.
- Soak testing: Run GoReplay or Siege at moderate concurrency for 24-48 hours to catch memory leaks, connection pool exhaustion, and database connection issues that only appear under sustained load.
Conclusion
The best load testing tool is the one that matches your specific question. GoReplay answers “does this code handle real traffic correctly?” Siege answers “how many concurrent users can this handle?” wrk answers “what’s the maximum throughput?” Vegeta answers “at what request rate does this break?” And k6 answers “does this change meet our performance requirements?”
All five tools are open source, free to run at any scale, and integrate with standard monitoring stacks. Running them on your own infrastructure keeps your API details private, eliminates cloud testing costs, and gives you the freedom to run tests as often and as aggressively as your staging environment can handle.
Start with wrk for quick benchmarks, add GoReplay for realistic traffic validation, and build k6 tests into your CI pipeline. That three-tool combination covers 95% of what most teams need from a load testing infrastructure.
Frequently Asked Questions (FAQ)
Which one should I choose in 2026?
The best choice depends on your specific requirements:
- For beginners: Start with the simplest option that covers your core use case
- For production: Choose the solution with the most active community and documentation
- For teams: Look for collaboration features and user management
- For privacy: Prefer fully open-source, self-hosted options with no telemetry
Refer to the comparison table above for detailed feature breakdowns.
Can I migrate between these tools?
Most tools support data import/export. Always:
- Backup your current data
- Test the migration on a staging environment
- Check official migration guides in the documentation
Are there free versions available?
All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.
How do I get started?
- Review the comparison table to identify your requirements
- Visit the official documentation (links provided above)
- Start with a Docker Compose setup for easy testing
- Join the community forums for troubleshooting