-
-
Notifications
You must be signed in to change notification settings - Fork 153
add rust server logs to the formats list #1426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rust server logs to the formats list #1426
Conversation
WalkthroughAdds a new rust_server_logs format with three regex patterns to resources/formats.json and introduces a unit test validating extraction across sample logs in src/event/format/known_schema.rs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant L as Log Line
participant EP as EventProcessor
participant F as formats.json (rust_server_logs)
participant R as Regex Engine
L->>EP: submit raw_log
EP->>F: load rust_server_logs patterns
loop Try patterns in order
EP->>R: apply Pattern1 / Pattern2 / Pattern3
alt match
R-->>EP: named groups (timestamp, level, module, body, ...)
EP-->>L: extracted fields
else no match
R-->>EP: no match
end
end
note over EP: If matched, optional fields (logger_context, thread_id, line_number) present only when captured.
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
resources/formats.json (2)
1484-1491
: Prefer stricter timestamp and level for env_logger-like bracketed linesKeeps noise low when this schema is selected and avoids accidental matches.
- "pattern": "^\\[(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}[^ ]+) (?P<level>\\w+)\\s+(?P<module>[^\\]]+)\\]\\s+(?P<body>.*)", + "pattern": "^\\[(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3,9})?(?:Z|[+-]\\d{2}:\\d{2})) (?P<level>TRACE|DEBUG|INFO|WARN|ERROR)\\s+(?P<module>[^\\]]+)\\]\\s+(?P<body>.*)",
1493-1500
: Broaden TZ support and narrow captures in the simple format
- Accept offsets, not just optional Z.
- Constrain level.
- Use
[^:]+
for module.- "pattern": "^(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+Z?)\\s+(?P<level>\\w+)\\s+(?P<module>\\S+?):\\s+(?P<body>.*)", + "pattern": "^(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3,9})?(?:Z|[+-]\\d{2}:\\d{2}))\\s+(?P<level>TRACE|DEBUG|INFO|WARN|ERROR)\\s+(?P<module>[^:]+):\\s+(?P<body>.*)",src/event/format/known_schema.rs (2)
517-593
: Extend assertions to validate parsed values and verify FORMAT_VERIFY_KEYPresence checks are fine; validating a couple of concrete captures will harden the test and catch regex drift. Also assert P_FORMAT_VERIFY_KEY is set to "true" on success.
@@ - for (i, log_text) in test_logs.iter().enumerate() { + for (i, log_text) in test_logs.iter().enumerate() { let mut obj = Map::new(); let log_field = "raw_log"; obj.insert(log_field.to_string(), Value::String(log_text.to_string())); let result = schema.check_or_extract(&mut obj, Some(log_field)); assert!( result.is_some(), "Failed to extract fields from rust server log {}: {}", i + 1, log_text ); + // Verify format verification marker + assert_eq!( + obj.get(FORMAT_VERIFY_KEY).and_then(|v| v.as_str()), + Some("true"), + "FORMAT_VERIFY_KEY should be true for log {}", + i + 1 + ); @@ - // ThreadId and line_number are only present in the first format (logs 0-3) + // ThreadId and line_number are only present in the first format (logs 0-3) if i < 4 { assert!( obj.contains_key("logger_context"), "Missing logger_context field for log {}", i + 1 ); assert!( obj.contains_key("thread_id"), "Missing thread_id field for log {}", i + 1 ); assert!( obj.contains_key("line_number"), "Missing line_number field for log {}", i + 1 ); + if i == 0 { + assert_eq!(obj.get("module").and_then(|v| v.as_str()), Some("parseable::handlers::http::cluster")); + assert_eq!(obj.get("line_number").and_then(|v| v.as_str()), Some("919")); + assert_eq!(obj.get("logger_context").and_then(|v| v.as_str()), Some("main")); + } } }
525-538
: Consider adding a log with timezone offset to cover non-UTC casesHelps ensure patterns accept
+/-HH:MM
in addition toZ
.let test_logs = vec![ @@ "2025-09-06T10:43:01.628980875Z DEBUG parseable::query::engine: Query executed in 45ms", + // offset timezone + "2025-09-06T10:43:01.628980875+00:00 INFO parseable::storage::s3: Offset timestamp accepted", ];
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
resources/formats.json
(1 hunks)src/event/format/known_schema.rs
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (1)
resources/formats.json (1)
1469-1501
: LGTM: adding rust_server_logs format is a solid extensionNew format entry and field sets align with existing conventions and should interop cleanly with the extraction pipeline.
Summary by CodeRabbit
New Features
Tests