Skip to content

Conversation

parmesant
Copy link
Contributor

@parmesant parmesant commented Sep 5, 2025

Fixes #XXXX.

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • Improvements

    • Adaptive, tiered time-binning for count queries (1 minute → 1 day) for more relevant aggregations.
    • Query results grouped and ordered consistently by bin end/start for clearer timelines.
  • API

    • Responses provide start_time and end_time; legacy bin start/end keys are accepted and mapped automatically.
  • Breaking Changes

    • num_bins is now optional — bin counts are auto-computed when omitted; supplying num_bins is still supported.

Copy link
Contributor

coderabbitai bot commented Sep 5, 2025

Walkthrough

CountsRequest.num_bins was made optional; bin sizing/bounds are computed dynamically from the time range (tiered 1m/5m/1h/1d). get_df_sql now emits table‑qualified DATE_BIN expressions aliased to _bin_start_time_/_bin_end_time_. CountsRecord fields gain serde aliases to map those aliases to start_time/end_time. Call sites updated to pass Some(...).

Changes

Cohort / File(s) Summary
Counts binning & serde
src/query/mod.rs
CountsRequest.num_bins changed from u64 to Option<u64>. CountsRecord.start_time and end_time annotated with #[serde(alias = "_bin_start_time_")] / #[serde(alias = "_bin_end_time_")]. Implemented dynamic binning in get_bin_density/bounds (tiered by total minutes → 1m/5m/1h/1d) and updated get_df_sql to emit table-qualified DATE_BIN expressions cast/aliased as _bin_start_time_ / _bin_end_time_, grouping/ordering by those aliases.
HTTP handler usage
src/handlers/http/query.rs
Updated construction of CountsRequest to supply num_bins: Some(...) where previously a plain integer was used (e.g., Some(1)).
Prism dataset integration
src/prism/logstream/mod.rs
Updated PrismDatasetRequest::get_counts to pass num_bins: Some(10) (was 10), aligning with CountsRequest becoming optional.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant H as HTTP Handler
  participant CR as CountsRequest
  participant QB as Query Builder (get_df_sql)
  participant DB as Database
  participant S as Serde
  participant R as CountsRecord[]

  H->>CR: build request (stream,start,end,conditions,num_bins: Some?/None)
  CR->>QB: request SQL (compute bins if num_bins is None)
  rect rgba(200,230,255,0.18)
    note right of QB: compute num_bins from total_minutes\n(<=5h:1m, <=1d:5m, <=10d:1h, else:1d)\nemit DATE_BIN(...) AS `_bin_start_time_`, ... AS `_bin_end_time_`
  end
  QB->>DB: execute generated SQL
  DB-->>CR: rows with `_bin_start_time_`, `_bin_end_time_`, count
  rect rgba(220,255,220,0.18)
    CR->>S: deserialize rows (serde aliases map `_bin_*_time_` → start_time/end_time)
    S-->>R: CountsRecord[]
  end
  CR-->>H: return CountsRecord[]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable
  • nitisht
  • praveen5959

Poem

I nibble at minutes, five or one,
Stack hours and days till the bins are done.
_bin_start_time_ hops, _bin_end_time_ takes a bow,
Serde maps names — the rows come through now. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/query/mod.rs (1)

457-479: Clarify binning boundaries (inclusive/exclusive) to avoid off-by-one surprises

Current checks mix <=, > and <. Consider “≤ upper bound” tiers for readability and predictability (e.g., 60m → 1m, ≤1d → 5m, ≤3d → 1h, ≤10d → 4h, else 1d). If the existing behavior is intentional, document it in-line.

-        } else if dur.num_minutes() > 60 * 1 && dur.num_minutes() < 60 * 24 {
+        } else if dur.num_minutes() <= 60 * 24 {
@@
-        } else if dur.num_minutes() > 60 * 24 && dur.num_minutes() < 60 * 24 * 3 {
+        } else if dur.num_minutes() <= 60 * 24 * 3 {
@@
-        } else if dur.num_minutes() > 60 * 24 * 3 && dur.num_minutes() < 60 * 24 * 10 {
+        } else if dur.num_minutes() <= 60 * 24 * 10 {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a273e4f and 8634799.

📒 Files selected for processing (1)
  • src/query/mod.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

  • src/query/mod.rs
🧬 Code graph analysis (1)
src/query/mod.rs (1)
src/alerts/alerts_utils.rs (1)
  • get_filter_string (361-453)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu

coderabbitai[bot]
coderabbitai bot previously approved these changes Sep 6, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Sep 7, 2025
- removes parameter `numBins` (bins are being calculated by the backend)
- Changes to binning logic
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/query/mod.rs (1)

373-376: Avoid double counting on bin boundaries; remove unwrap on timestamp conversion

Use half-open interval (start <= t < end). Also avoid unwrap() on from_timestamp_millis to prevent panics on malformed stats.

Apply this diff:

-                                TypedStatistics::Int(Int64Type { min, .. }) => {
-                                    let min = DateTime::from_timestamp_millis(*min).unwrap();
-                                    bin.start <= min && bin.end >= min // Determines if a column matches the bin's time range.
-                                }
+                                TypedStatistics::Int(Int64Type { min, .. }) => {
+                                    if let Some(min) = DateTime::from_timestamp_millis(*min) {
+                                        // half-open interval to avoid boundary double counts
+                                        bin.start <= min && min < bin.end
+                                    } else {
+                                        false
+                                    }
+                                }
♻️ Duplicate comments (2)
src/query/mod.rs (2)

467-490: Fix DATE_BIN SQL: use INTERVAL literals, stable aliases, and consistent casting

  • '1m'/'5m'/'1h'/'1d' are not standard; DataFusion expects INTERVAL 'N minute/hour/day'.
  • Cast both start and end to the same textual type (VARCHAR) for consistent schemas.
  • Emit stable aliases start_time/end_time to avoid leaking legacy bin* names.

Apply this diff:

-        let table_name = &self.stream;
-        let start_time_col_name = "_bin_start_time_";
-        let end_time_col_name = "_bin_end_time_";
+        let table_name = &self.stream;
+        let start_time_col_name = "start_time";
+        let end_time_col_name = "end_time";
+        let origin = "1970-01-01 00:00:00+00";
         let date_bin = if dur.num_minutes() <= 60 * 5 {
             // less than 5 hour = 1 min bin
             format!(
-                "CAST(DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '1 minute' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() <= 60 * 24 {
             // 1 day = 5 min bin
             format!(
-                "CAST(DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '5m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '5 minutes' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() < 60 * 24 * 10 {
             // 10 days = 1 hour bin
             format!(
-                "CAST(DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1h' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '1 hour' AS VARCHAR) AS {end_time_col_name}"
             )
         } else {
             // 1 day
             format!(
-                "CAST(DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1d' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '1 day' AS VARCHAR) AS {end_time_col_name}"
             )
         };

492-501: Apply time-range predicate and avoid alias names in GROUP BY/ORDER BY

  • Current query ignores the requested [start_time, end_time) window.
  • GROUP BY/ORDER BY by alias may fail on DF; use ordinals.

Apply this diff:

-        let query = if let Some(conditions) = &count_conditions.conditions {
+        let start_str = time_range.start.to_rfc3339();
+        let end_str = time_range.end.to_rfc3339();
+        let time_predicate = format!(
+            "\"{table_name}\".\"{time_column}\" >= TIMESTAMP '{start_str}' AND \"{table_name}\".\"{time_column}\" < TIMESTAMP '{end_str}'"
+        );
+        let query = if let Some(conditions) = &count_conditions.conditions {
             let f = get_filter_string(conditions).map_err(QueryError::CustomError)?;
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE {} GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
-                f
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE ({f}) AND {time_predicate} GROUP BY 1,2 ORDER BY 2"
             )
         } else {
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE {time_predicate} GROUP BY 1,2 ORDER BY 2"
             )
         };
🧹 Nitpick comments (1)
src/query/mod.rs (1)

339-395: Optional: Align bin boundaries with DATE_BIN origin for parity

If you need exact parity between get_bin_density() and get_df_sql(), align get_bounds() to the same origin (1970-01-01 00:00:00+00) when computing bin edges; otherwise counts may shift relative to SQL bins.

Would you like a follow-up patch that snaps get_bounds() to the DATE_BIN grid?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f04aa25 and 312778c.

📒 Files selected for processing (3)
  • src/handlers/http/query.rs (0 hunks)
  • src/prism/logstream/mod.rs (0 hunks)
  • src/query/mod.rs (4 hunks)
💤 Files with no reviewable changes (2)
  • src/prism/logstream/mod.rs
  • src/handlers/http/query.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

  • src/query/mod.rs
📚 Learning: 2025-09-05T09:18:44.813Z
Learnt from: parmesant
PR: parseablehq/parseable#1425
File: src/query/mod.rs:484-495
Timestamp: 2025-09-05T09:18:44.813Z
Learning: In the Parseable system, stream names and column names cannot contain quotes, which eliminates SQL injection concerns when interpolating these identifiers directly into SQL queries in src/query/mod.rs.

Applied to files:

  • src/query/mod.rs
🧬 Code graph analysis (1)
src/query/mod.rs (2)
src/handlers/http/query.rs (1)
  • query (110-144)
src/alerts/alerts_utils.rs (1)
  • get_filter_string (361-453)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: coverage
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
src/query/mod.rs (1)

301-306: Serde aliasing looks correct; keeps canonical field names stable

Keeping the field names as start_time/end_time and accepting legacy bin* via alias is the right trade-off.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/query/mod.rs (1)

435-459: Emit exact [start,end) bins with a simple while-loop

Current logic can over-emit bins and has ambiguous last-bin handling.

-        let loop_end = if have_remainder {
-            num_bins
-        } else {
-            num_bins - 1
-        };
-
-        // Create bins for all but the last date
-        for _ in 0..loop_end {
-            let end = start + Duration::minutes(quotient as i64);
-            bounds.push(TimeBounds { start, end });
-            start = end;
-        }
-
-        // Add the last bin, accounting for any remainder, should we include it?
-        if have_remainder {
-            bounds.push(TimeBounds {
-                start,
-                end: start + Duration::minutes(remainder as i64),
-            });
-        } else {
-            bounds.push(TimeBounds {
-                start,
-                end: start + Duration::minutes(quotient as i64),
-            });
-        }
+        while start < time_range.end {
+            let end = (start + step).min(time_range.end);
+            bounds.push(TimeBounds { start, end });
+            start = end;
+        }
♻️ Duplicate comments (2)
src/query/mod.rs (2)

473-508: Fix DATE_BIN SQL: standard intervals, stable aliases, consistent casting, and safer GROUP BY/ORDER BY

Use ANSI/Postgres interval literals, emit start_time/end_time, cast both to the same type, and avoid alias names in GROUP BY/ORDER BY.

-        let table_name = &self.stream;
-        let start_time_col_name = "_bin_start_time_";
-        let end_time_col_name = "_bin_end_time_";
-        let date_bin = if dur.num_minutes() <= 60 * 5 {
+        let table_name = &self.stream;
+        let start_time_col_name = "start_time";
+        let end_time_col_name = "end_time";
+        let date_bin = if dur.num_minutes() <= 60 * 5 {
             // less than 5 hour = 1 min bin
             format!(
-                "CAST(DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1 minute' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() <= 60 * 24 {
             // 1 day = 5 min bin
             format!(
-                "CAST(DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '5m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '5 minutes' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() < 60 * 24 * 10 {
             // 10 days = 1 hour bin
             format!(
-                "CAST(DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1h' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1 hour' AS VARCHAR) AS {end_time_col_name}"
             )
         } else {
             // 1 day
             format!(
-                "CAST(DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1d' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1 day' AS VARCHAR) AS {end_time_col_name}"
             )
         };
 
         let query = if let Some(conditions) = &count_conditions.conditions {
             let f = get_filter_string(conditions).map_err(QueryError::CustomError)?;
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE {} GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE ({f}) GROUP BY 1,2 ORDER BY 2",
                 f
             )
         } else {
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" GROUP BY 1,2 ORDER BY 2",
             )
         };

This also aligns emitted keys with the handler’s fields: ["start_time","end_time","count"].


405-427: Zero-division and incorrect bin count when remainder > 0

  • total_minutes == 0 can set num_bins to 0 (dynamic path) and panic on divide-by-zero.
  • The current quotient/remainder approach later emits num_bins + 1 bins when there is a remainder.

Use a fixed step and generate half‑open [start, end) bins:

-        let num_bins = if let Some(num_bins) = self.num_bins {
-            num_bins
-        } else {
-            // create number of bins based on total minutes
-            if total_minutes <= 60 * 5 {
-                // till 5 hours, 1 bin = 1 min
-                total_minutes
-            } else if total_minutes <= 60 * 24 {
-                // till 1 day, 1 bin = 5 min
-                total_minutes.div_ceil(5)
-            } else if total_minutes <= 60 * 24 * 10 {
-                // till 10 days, 1 bin = 1 hour
-                total_minutes.div_ceil(60)
-            } else {
-                // > 10 days, 1 bin = 1 day
-                total_minutes.div_ceil(1440)
-            }
-        };
-
-        // divide minutes by num bins to get minutes per bin
-        let quotient = total_minutes / num_bins;
-        let remainder = total_minutes % num_bins;
+        if total_minutes == 0 {
+            return vec![];
+        }
+        // choose step minutes
+        let minutes_per_bin: u64 = if let Some(n) = self.num_bins {
+            ((total_minutes + n - 1) / n).max(1)
+        } else if total_minutes <= 60 * 5 {
+            1
+        } else if total_minutes <= 60 * 24 {
+            5
+        } else if total_minutes <= 60 * 24 * 10 {
+            60
+        } else {
+            1440
+        };
+        let step = Duration::minutes(minutes_per_bin as i64);
🧹 Nitpick comments (1)
src/prism/logstream/mod.rs (1)

354-356: Prefer dynamic binning here (pass None) to use the new tiered logic

Hard‑coding 10 bins defeats the “dynamic 1m/5m/1h/1d” behavior introduced by this PR.

-            num_bins: Some(10),
+            num_bins: None,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 312778c and d473745.

📒 Files selected for processing (3)
  • src/handlers/http/query.rs (1 hunks)
  • src/prism/logstream/mod.rs (1 hunks)
  • src/query/mod.rs (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

  • src/handlers/http/query.rs
  • src/query/mod.rs
📚 Learning: 2025-09-05T09:18:44.813Z
Learnt from: parmesant
PR: parseablehq/parseable#1425
File: src/query/mod.rs:484-495
Timestamp: 2025-09-05T09:18:44.813Z
Learning: In the Parseable system, stream names and column names cannot contain quotes, which eliminates SQL injection concerns when interpolating these identifiers directly into SQL queries in src/query/mod.rs.

Applied to files:

  • src/query/mod.rs
🧬 Code graph analysis (1)
src/query/mod.rs (2)
src/handlers/http/query.rs (1)
  • query (110-144)
src/alerts/alerts_utils.rs (1)
  • get_filter_string (361-453)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (3)
src/handlers/http/query.rs (2)

166-172: LGTM: Option-wrapped num_bins

Wrapping with Some(1) is correct for a single-bin fast COUNT.


166-172: Fix all endTime usages to snake_case and verify other occurrences

  • In src/handlers/http/query.rs, update both fields vectors from ["start_time","endTime","count"] to ["start_time","end_time","count"].
  • Note additional endTime references in src/handlers/airplane.rs (JSON payload keys) and in comments in src/query/mod.rs; please confirm whether those should also be renamed to end_time for consistency.
src/query/mod.rs (1)

301-309: LGTM: serde aliases preserve backward compatibility

CountsRecord keeps canonical start_time/end_time while accepting _bin_* during deserialization.

@nikhilsinhaparseable nikhilsinhaparseable merged commit 7d5600b into parseablehq:main Sep 9, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants