Raw syslog messages are unstructured text streams that are difficult to search, alert on, and analyze at scale. Syslog analysis engines transform unstructured log data into structured, queryable formats by parsing fields, extracting patterns, and enriching messages with contextual metadata. This guide compares three powerful approaches to syslog analysis: rsyslog with mmjsonparse, syslog-ng with PatternDB, and Vector with Vector Remap Language (VRL).
Why Structured Syslog Analysis Matters
Traditional syslog servers collect raw text messages in flat files. While this approach preserves the original data, it makes operational analysis extremely difficult:
- Searching requires regex: Finding specific events across millions of log lines demands complex regular expressions
- Alerting is fragile: Threshold-based alerts on unstructured text produce false positives and miss critical events
- Correlation is impossible: Without parsed fields, correlating events across services, hosts, and time windows is manual and error-prone
- Compliance reporting is tedious: Audit reports require extracting specific fields (user, action, timestamp, source IP) from variable-format text
Structured syslog analysis solves these problems by parsing raw log messages into named fields at ingestion time. Instead of searching raw text, you query structured data: source_ip=10.0.1.5 AND severity=error AND facility=auth.
For foundational syslog server setup, see our rsyslog vs syslog-ng vs Vector comparison. If you need log forwarding to centralized systems, check our Fluent Bit vs Vector vs OTEL Collector guide. For log sampling strategies, our log sampling guide covers volume reduction techniques.
rsyslog mmjsonparse: JSON-Aware Syslog Processing
rsyslog is the default syslog daemon on most Linux distributions. Its mmjsonparse module provides JSON parsing capabilities, enabling rsyslog to extract structured data from JSON-formatted syslog messages and CEE/Lumberjack format logs.
How mmjsonparse Works
The mmjsonparse module intercepts syslog messages as they arrive, attempts to parse the message payload as JSON, and if successful, makes the parsed fields available to rsyslog templates and output actions. It supports the CEE cookie prefix (@cee:) that indicates JSON-formatted content within syslog messages.
Installation and Configuration
| |
| |
Advanced: mmnormalize for Pattern-Based Parsing
When log messages are not JSON-formatted, the mmnormalize module uses liblognorm rulebases to parse arbitrary text formats:
| |
Docker Compose for rsyslog Processing Stack
| |
Pros and Cons
| Feature | rsyslog mmjsonparse |
|---|---|
| JSON parsing | Native (@cee: format) |
| Pattern parsing | Via mmnormalize + liblognorm |
| Performance | Very high (C-based) |
| Installation | Pre-installed on most distros |
| Learning curve | Medium (rsyslog config syntax) |
| Output destinations | 80+ modules (file, ES, Kafka, etc.) |
| Field extraction | Template-based |
| Enrichment | Limited (property replacer) |
syslog-ng PatternDB: Database-Driven Log Classification
syslog-ng offers PatternDB, a powerful log classification and parsing engine that uses XML-defined pattern databases to parse, classify, and enrich syslog messages in real time.
How PatternDB Works
PatternDB matches incoming log messages against a database of predefined patterns. When a match is found, it extracts named fields, assigns classification tags (security, system, application), and can generate alerts based on message context. Unlike simple regex matching, PatternDB uses a decision-tree algorithm for efficient pattern matching at scale.
PatternDB Configuration
| |
PatternDB XML Definition
| |
Docker Compose for syslog-ng Stack
| |
Pros and Cons
| Feature | syslog-ng PatternDB |
|---|---|
| Pattern matching | Decision-tree based (fast) |
| Classification | Built-in class system |
| Field extraction | Named field extraction from patterns |
| Alerting | Value-based alert generation |
| Pattern authoring | XML format with examples/testing |
| Performance | High (optimized C implementation) |
| Community patterns | Limited (custom authoring required) |
| Learning curve | Medium to high (XML pattern syntax) |
Vector VRL: Programmable Log Transformation
Vector by Datadog is a high-performance observability data pipeline that uses Vector Remap Language (VRL) for log parsing, transformation, and enrichment. VRL is a purpose-built language for data processing that combines the expressiveness of programming languages with the safety of static type checking.
How VRL Works
VRL scripts are applied to each event as it flows through the Vector pipeline. The language provides built-in functions for parsing (regex, JSON, key-value, CSV), type conversion, conditional logic, and enrichment. Unlike regex-based parsers, VRL programs are statically analyzed before execution, ensuring type safety and catching errors at configuration time.
Vector Configuration with VRL
| |
Advanced VRL: Custom Parsing Functions
| |
Docker Compose for Vector Pipeline
| |
Pros and Cons
| Feature | Vector VRL |
|---|---|
| Language | Purpose-built remap language |
| Type safety | Static type checking at config time |
| Parsing functions | JSON, regex, key-value, CSV, syslog, grok |
| Performance | Very high (Rust-based) |
| Enrichment | Built-in lookup tables (geoip, etc.) |
| Error handling | Graceful error handling in scripts |
| Learning curve | Low to medium (VRL is designed for simplicity) |
| Community | Growing (Datadobacked) |
Comparison Summary
| Feature | rsyslog mmjsonparse | syslog-ng PatternDB | Vector VRL |
|---|---|---|---|
| Parsing approach | JSON module + liblognorm | XML pattern database | VRL scripts |
| Performance | Very high | High | Very high |
| Pattern authoring | Rule files (.rb) | XML with test cases | VRL code |
| Type safety | None | Schema validation | Static type checking |
| Enrichment | Limited | Classifier values | Lookup tables |
| Installation | Pre-installed (Linux) | Package manager | Binary/Docker |
| Learning curve | Medium | High | Low-medium |
| Best for | Traditional Linux sysadmins | Enterprise log classification | Modern observability pipelines |
Choosing the Right Syslog Analysis Engine
rsyslog mmjsonparse is ideal when you need maximum compatibility with existing Linux infrastructure. It is pre-installed on most distributions and provides solid JSON and pattern-based parsing with minimal additional setup.
syslog-ng PatternDB excels in environments that require structured log classification with formal pattern definitions. The XML-based pattern authoring with embedded test cases makes it suitable for compliance-heavy environments where parsing rules must be documented and validated.
Vector VRL is the best choice for modern observability pipelines that need flexible, programmable log transformation. Its static type checking catches configuration errors before deployment, and its performance makes it suitable for high-volume log processing.
FAQ
Can rsyslog parse non-JSON log formats?
Yes. The mmnormalize module uses liblognorm rulebases to parse arbitrary text formats. You define patterns in .rb rule files that extract named fields from structured log messages (e.g., Apache access logs, sudo logs, SSH logs). This is more manual than VRL or PatternDB but works well for consistent log formats.
Does syslog-ng PatternDB support dynamic pattern updates?
PatternDB patterns are loaded from XML files at startup. To update patterns, you must modify the XML file and reload syslog-ng (systemctl reload syslog-ng). Some deployments use a configuration management tool (Ansible, Puppet) to manage PatternDB files and trigger reloads automatically.
Is VRL a general-purpose programming language?
No. VRL is a domain-specific language designed specifically for log transformation. It does not support loops, recursion, or arbitrary computation. This intentional limitation ensures that VRL programs always terminate and cannot introduce performance issues or security vulnerabilities into the data pipeline.
Can I combine multiple syslog analysis engines?
Yes. A common pattern is to use rsyslog or syslog-ng as the syslog receiver (collecting from network devices and servers), then forward structured output to Vector for additional transformation and routing to multiple destinations (Elasticsearch, Loki, cloud storage).
How do I test syslog parsing rules before deploying?
For rsyslog, use rsyslogd -N1 to validate configuration syntax. For syslog-ng, use syslog-ng --syntax-only. For Vector, use vector validate --config-yaml vector.yaml to check VRL syntax and type safety. Vector also provides an online VRL playground at play.vrl.dev for testing scripts interactively.
Which engine handles the highest log volume?
All three engines are high-performance. rsyslog and syslog-ng can process hundreds of thousands of messages per second on modern hardware. Vector, being written in Rust, achieves similar throughput with lower memory usage. For most deployments, the bottleneck is the output destination (Elasticsearch, disk I/O) rather than the parsing engine.
Can these engines parse Windows Event Logs forwarded via syslog?
Windows Event Logs forwarded through NXLog or Winlogbeat as syslog can be parsed by all three engines. Vector has built-in parse_windows_eventlog function. rsyslog and syslog-ng require custom patterns or JSON parsing (if the forwarder sends JSON-formatted events).