Skip to content

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Sep 6, 2025

Summary by CodeRabbit

  • New Features

    • Added native support for Rust server logs across multiple common formats. Automatically extracts timestamp, level, module, and message, plus logger context, thread ID, and line number when present. Existing formats are unaffected.
  • Tests

    • Added comprehensive tests validating parsing across Rust log variants and the presence of core and optional fields, ensuring non-breaking behavior.

Copy link
Contributor

coderabbitai bot commented Sep 6, 2025

Walkthrough

Adds a new rust_server_logs format with three regex patterns to resources/formats.json and introduces a unit test validating extraction across sample logs in src/event/format/known_schema.rs.

Changes

Cohort / File(s) Summary
Log format definition
resources/formats.json
Appends rust_server_logs with three regex patterns capturing timestamp, level, module, body, and optionally logger_context, thread_id, and line_number. No existing formats modified.
Tests for schema extraction
src/event/format/known_schema.rs
Adds test_rust_server_logs validating parsing against the new format across multiple sample lines; asserts presence of required and conditional fields.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant L as Log Line
  participant EP as EventProcessor
  participant F as formats.json (rust_server_logs)
  participant R as Regex Engine

  L->>EP: submit raw_log
  EP->>F: load rust_server_logs patterns
  loop Try patterns in order
    EP->>R: apply Pattern1 / Pattern2 / Pattern3
    alt match
      R-->>EP: named groups (timestamp, level, module, body, ...)
      EP-->>L: extracted fields
    else no match
      R-->>EP: no match
    end
  end
  note over EP: If matched, optional fields (logger_context, thread_id, line_number) present only when captured.
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • nitisht

Poem

A rabbit taps logs with a curious ear,
Three patterns hop—now the fields are clear.
Timestamps twinkle, modules align,
Threads peek out with a tidy sign.
With regex whiskers twitching bright,
We parse the dusk and dawn just right. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
resources/formats.json (2)

1484-1491: Prefer stricter timestamp and level for env_logger-like bracketed lines

Keeps noise low when this schema is selected and avoids accidental matches.

-        "pattern": "^\\[(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}[^ ]+) (?P<level>\\w+)\\s+(?P<module>[^\\]]+)\\]\\s+(?P<body>.*)",
+        "pattern": "^\\[(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3,9})?(?:Z|[+-]\\d{2}:\\d{2})) (?P<level>TRACE|DEBUG|INFO|WARN|ERROR)\\s+(?P<module>[^\\]]+)\\]\\s+(?P<body>.*)",

1493-1500: Broaden TZ support and narrow captures in the simple format

  • Accept offsets, not just optional Z.
  • Constrain level.
  • Use [^:]+ for module.
-        "pattern": "^(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+Z?)\\s+(?P<level>\\w+)\\s+(?P<module>\\S+?):\\s+(?P<body>.*)",
+        "pattern": "^(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3,9})?(?:Z|[+-]\\d{2}:\\d{2}))\\s+(?P<level>TRACE|DEBUG|INFO|WARN|ERROR)\\s+(?P<module>[^:]+):\\s+(?P<body>.*)",
src/event/format/known_schema.rs (2)

517-593: Extend assertions to validate parsed values and verify FORMAT_VERIFY_KEY

Presence checks are fine; validating a couple of concrete captures will harden the test and catch regex drift. Also assert P_FORMAT_VERIFY_KEY is set to "true" on success.

@@
-        for (i, log_text) in test_logs.iter().enumerate() {
+        for (i, log_text) in test_logs.iter().enumerate() {
             let mut obj = Map::new();
             let log_field = "raw_log";
             obj.insert(log_field.to_string(), Value::String(log_text.to_string()));
 
             let result = schema.check_or_extract(&mut obj, Some(log_field));
             assert!(
                 result.is_some(),
                 "Failed to extract fields from rust server log {}: {}",
                 i + 1,
                 log_text
             );
 
+            // Verify format verification marker
+            assert_eq!(
+                obj.get(FORMAT_VERIFY_KEY).and_then(|v| v.as_str()),
+                Some("true"),
+                "FORMAT_VERIFY_KEY should be true for log {}",
+                i + 1
+            );
@@
-            // ThreadId and line_number are only present in the first format (logs 0-3)
+            // ThreadId and line_number are only present in the first format (logs 0-3)
             if i < 4 {
                 assert!(
                     obj.contains_key("logger_context"),
                     "Missing logger_context field for log {}",
                     i + 1
                 );
                 assert!(
                     obj.contains_key("thread_id"),
                     "Missing thread_id field for log {}",
                     i + 1
                 );
                 assert!(
                     obj.contains_key("line_number"),
                     "Missing line_number field for log {}",
                     i + 1
                 );
+                if i == 0 {
+                    assert_eq!(obj.get("module").and_then(|v| v.as_str()), Some("parseable::handlers::http::cluster"));
+                    assert_eq!(obj.get("line_number").and_then(|v| v.as_str()), Some("919"));
+                    assert_eq!(obj.get("logger_context").and_then(|v| v.as_str()), Some("main"));
+                }
             }
         }

525-538: Consider adding a log with timezone offset to cover non-UTC cases

Helps ensure patterns accept +/-HH:MM in addition to Z.

         let test_logs = vec![
@@
             "2025-09-06T10:43:01.628980875Z DEBUG parseable::query::engine: Query executed in 45ms",
+            // offset timezone
+            "2025-09-06T10:43:01.628980875+00:00 INFO parseable::storage::s3: Offset timestamp accepted",
         ];
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e980283 and 7722c2a.

📒 Files selected for processing (2)
  • resources/formats.json (1 hunks)
  • src/event/format/known_schema.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (1)
resources/formats.json (1)

1469-1501: LGTM: adding rust_server_logs format is a solid extension

New format entry and field sets align with existing conventions and should interop cleanly with the extraction pipeline.

@nitisht nitisht merged commit e9e013d into parseablehq:main Sep 6, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants