Refactor date binning logic and SQL query formatting #1425

parmesant · 2025-09-05T08:30:20Z

Fixes #XXXX.

Description

This PR has:

been tested to ensure log ingestion and log query works.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added documentation for new or modified features or behaviors.

Summary by CodeRabbit

Improvements
- Adaptive, tiered time-binning for count queries (1 minute → 1 day) for more relevant aggregations.
- Query results grouped and ordered consistently by bin end/start for clearer timelines.
API
- Responses provide start_time and end_time; legacy bin start/end keys are accepted and mapped automatically.
Breaking Changes
- num_bins is now optional — bin counts are auto-computed when omitted; supplying num_bins is still supported.

Signed-off-by: parmesant <[email protected]>

coderabbitai · 2025-09-05T08:30:27Z

Walkthrough

CountsRequest.num_bins was made optional; bin sizing/bounds are computed dynamically from the time range (tiered 1m/5m/1h/1d). get_df_sql now emits table‑qualified DATE_BIN expressions aliased to _bin_start_time_/_bin_end_time_. CountsRecord fields gain serde aliases to map those aliases to start_time/end_time. Call sites updated to pass Some(...).

Changes

Cohort / File(s)	Summary
Counts binning & serde `src/query/mod.rs`	`CountsRequest.num_bins` changed from `u64` to `Option<u64>`. `CountsRecord.start_time` and `end_time` annotated with `#[serde(alias = "_bin_start_time_")]` / `#[serde(alias = "_bin_end_time_")]`. Implemented dynamic binning in `get_bin_density`/bounds (tiered by total minutes → 1m/5m/1h/1d) and updated `get_df_sql` to emit table-qualified DATE_BIN expressions cast/aliased as `_bin_start_time_` / `_bin_end_time_`, grouping/ordering by those aliases.
HTTP handler usage `src/handlers/http/query.rs`	Updated construction of `CountsRequest` to supply `num_bins: Some(...)` where previously a plain integer was used (e.g., `Some(1)`).
Prism dataset integration `src/prism/logstream/mod.rs`	Updated `PrismDatasetRequest::get_counts` to pass `num_bins: Some(10)` (was `10`), aligning with `CountsRequest` becoming optional.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant H as HTTP Handler
  participant CR as CountsRequest
  participant QB as Query Builder (get_df_sql)
  participant DB as Database
  participant S as Serde
  participant R as CountsRecord[]

  H->>CR: build request (stream,start,end,conditions,num_bins: Some?/None)
  CR->>QB: request SQL (compute bins if num_bins is None)
  rect rgba(200,230,255,0.18)
    note right of QB: compute num_bins from total_minutes\n(<=5h:1m, <=1d:5m, <=10d:1h, else:1d)\nemit DATE_BIN(...) AS `_bin_start_time_`, ... AS `_bin_end_time_`
  end
  QB->>DB: execute generated SQL
  DB-->>CR: rows with `_bin_start_time_`, `_bin_end_time_`, count
  rect rgba(220,255,220,0.18)
    CR->>S: deserialize rows (serde aliases map `_bin_*_time_` → start_time/end_time)
    S-->>R: CountsRecord[]
  end
  CR-->>H: return CountsRecord[]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Updates #1408 — modifies the same query binning code and DATE_BIN/time-partition logic in src/query/mod.rs.
feat: streaming response #1317 — updates CountsRequest usage in handlers similar to this change (Option vs plain integer).
Updates for counts API #1347 — touches counts query generation and aliasing for bin start/end times.

Suggested labels

for next release

Suggested reviewers

nikhilsinhaparseable
nitisht
praveen5959

Poem

I nibble at minutes, five or one,
Stack hours and days till the bins are done.
_bin_start_time_ hops, _bin_end_time_ takes a bow,
Serde maps names — the rows come through now. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/query/mod.rs (1)
457-479: Clarify binning boundaries (inclusive/exclusive) to avoid off-by-one surprises

Current checks mix <=, > and <. Consider “≤ upper bound” tiers for readability and predictability (e.g., 60m → 1m, ≤1d → 5m, ≤3d → 1h, ≤10d → 4h, else 1d). If the existing behavior is intentional, document it in-line.
-        } else if dur.num_minutes() > 60 * 1 && dur.num_minutes() < 60 * 24 {
+        } else if dur.num_minutes() <= 60 * 24 {
@@
-        } else if dur.num_minutes() > 60 * 24 && dur.num_minutes() < 60 * 24 * 3 {
+        } else if dur.num_minutes() <= 60 * 24 * 3 {
@@
-        } else if dur.num_minutes() > 60 * 24 * 3 && dur.num_minutes() < 60 * 24 * 10 {
+        } else if dur.num_minutes() <= 60 * 24 * 10 {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a273e4f and 8634799.

📒 Files selected for processing (1)

src/query/mod.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

📚 Learning: 2025-06-18T12:44:31.983Z

Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

src/query/mod.rs

🧬 Code graph analysis (1)

src/query/mod.rs (1)

src/alerts/alerts_utils.rs (1)

get_filter_string (361-453)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: coverage
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu

src/query/mod.rs

- removes parameter `numBins` (bins are being calculated by the backend) - Changes to binning logic

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/query/mod.rs (1)

373-376: Avoid double counting on bin boundaries; remove unwrap on timestamp conversion

Use half-open interval (start <= t < end). Also avoid unwrap() on from_timestamp_millis to prevent panics on malformed stats.

Apply this diff:

-                                TypedStatistics::Int(Int64Type { min, .. }) => {
-                                    let min = DateTime::from_timestamp_millis(*min).unwrap();
-                                    bin.start <= min && bin.end >= min // Determines if a column matches the bin's time range.
-                                }
+                                TypedStatistics::Int(Int64Type { min, .. }) => {
+                                    if let Some(min) = DateTime::from_timestamp_millis(*min) {
+                                        // half-open interval to avoid boundary double counts
+                                        bin.start <= min && min < bin.end
+                                    } else {
+                                        false
+                                    }
+                                }

♻️ Duplicate comments (2)

src/query/mod.rs (2)

467-490: Fix DATE_BIN SQL: use INTERVAL literals, stable aliases, and consistent casting

'1m'/'5m'/'1h'/'1d' are not standard; DataFusion expects INTERVAL 'N minute/hour/day'.
Cast both start and end to the same textual type (VARCHAR) for consistent schemas.
Emit stable aliases start_time/end_time to avoid leaking legacy bin* names.

Apply this diff:

-        let table_name = &self.stream;
-        let start_time_col_name = "_bin_start_time_";
-        let end_time_col_name = "_bin_end_time_";
+        let table_name = &self.stream;
+        let start_time_col_name = "start_time";
+        let end_time_col_name = "end_time";
+        let origin = "1970-01-01 00:00:00+00";
         let date_bin = if dur.num_minutes() <= 60 * 5 {
             // less than 5 hour = 1 min bin
             format!(
-                "CAST(DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '1 minute' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() <= 60 * 24 {
             // 1 day = 5 min bin
             format!(
-                "CAST(DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '5m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '5 minutes' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() < 60 * 24 * 10 {
             // 10 days = 1 hour bin
             format!(
-                "CAST(DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1h' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '1 hour' AS VARCHAR) AS {end_time_col_name}"
             )
         } else {
             // 1 day
             format!(
-                "CAST(DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1d' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '{origin}') + INTERVAL '1 day' AS VARCHAR) AS {end_time_col_name}"
             )
         };

492-501: Apply time-range predicate and avoid alias names in GROUP BY/ORDER BY

Current query ignores the requested [start_time, end_time) window.
GROUP BY/ORDER BY by alias may fail on DF; use ordinals.

Apply this diff:

-        let query = if let Some(conditions) = &count_conditions.conditions {
+        let start_str = time_range.start.to_rfc3339();
+        let end_str = time_range.end.to_rfc3339();
+        let time_predicate = format!(
+            "\"{table_name}\".\"{time_column}\" >= TIMESTAMP '{start_str}' AND \"{table_name}\".\"{time_column}\" < TIMESTAMP '{end_str}'"
+        );
+        let query = if let Some(conditions) = &count_conditions.conditions {
             let f = get_filter_string(conditions).map_err(QueryError::CustomError)?;
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE {} GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
-                f
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE ({f}) AND {time_predicate} GROUP BY 1,2 ORDER BY 2"
             )
         } else {
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE {time_predicate} GROUP BY 1,2 ORDER BY 2"
             )
         };

🧹 Nitpick comments (1)

src/query/mod.rs (1)

339-395: Optional: Align bin boundaries with DATE_BIN origin for parity

If you need exact parity between get_bin_density() and get_df_sql(), align get_bounds() to the same origin (1970-01-01 00:00:00+00) when computing bin edges; otherwise counts may shift relative to SQL bins.

Would you like a follow-up patch that snaps get_bounds() to the DATE_BIN grid?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f04aa25 and 312778c.

📒 Files selected for processing (3)

src/handlers/http/query.rs (0 hunks)
src/prism/logstream/mod.rs (0 hunks)
src/query/mod.rs (4 hunks)

💤 Files with no reviewable changes (2)

src/prism/logstream/mod.rs
src/handlers/http/query.rs

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

📚 Learning: 2025-06-18T12:44:31.983Z

Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

src/query/mod.rs

📚 Learning: 2025-09-05T09:18:44.813Z

Learnt from: parmesant
PR: parseablehq/parseable#1425
File: src/query/mod.rs:484-495
Timestamp: 2025-09-05T09:18:44.813Z
Learning: In the Parseable system, stream names and column names cannot contain quotes, which eliminates SQL injection concerns when interpolating these identifiers directly into SQL queries in src/query/mod.rs.

Applied to files:

src/query/mod.rs

🧬 Code graph analysis (1)

src/query/mod.rs (2)

src/handlers/http/query.rs (1)

query (110-144)

src/alerts/alerts_utils.rs (1)

get_filter_string (361-453)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: coverage
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments

🔇 Additional comments (1)

src/query/mod.rs (1)

301-306: Serde aliasing looks correct; keeps canonical field names stable

Keeping the field names as start_time/end_time and accepting legacy bin* via alias is the right trade-off.

src/query/mod.rs

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/query/mod.rs (1)

435-459: Emit exact [start,end) bins with a simple while-loop

Current logic can over-emit bins and has ambiguous last-bin handling.

-        let loop_end = if have_remainder {
-            num_bins
-        } else {
-            num_bins - 1
-        };
-
-        // Create bins for all but the last date
-        for _ in 0..loop_end {
-            let end = start + Duration::minutes(quotient as i64);
-            bounds.push(TimeBounds { start, end });
-            start = end;
-        }
-
-        // Add the last bin, accounting for any remainder, should we include it?
-        if have_remainder {
-            bounds.push(TimeBounds {
-                start,
-                end: start + Duration::minutes(remainder as i64),
-            });
-        } else {
-            bounds.push(TimeBounds {
-                start,
-                end: start + Duration::minutes(quotient as i64),
-            });
-        }
+        while start < time_range.end {
+            let end = (start + step).min(time_range.end);
+            bounds.push(TimeBounds { start, end });
+            start = end;
+        }

♻️ Duplicate comments (2)

src/query/mod.rs (2)

473-508: Fix DATE_BIN SQL: standard intervals, stable aliases, consistent casting, and safer GROUP BY/ORDER BY

Use ANSI/Postgres interval literals, emit start_time/end_time, cast both to the same type, and avoid alias names in GROUP BY/ORDER BY.

-        let table_name = &self.stream;
-        let start_time_col_name = "_bin_start_time_";
-        let end_time_col_name = "_bin_end_time_";
-        let date_bin = if dur.num_minutes() <= 60 * 5 {
+        let table_name = &self.stream;
+        let start_time_col_name = "start_time";
+        let end_time_col_name = "end_time";
+        let date_bin = if dur.num_minutes() <= 60 * 5 {
             // less than 5 hour = 1 min bin
             format!(
-                "CAST(DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 minute', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1 minute' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() <= 60 * 24 {
             // 1 day = 5 min bin
             format!(
-                "CAST(DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('5m', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '5m' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '5 minutes', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '5 minutes' AS VARCHAR) AS {end_time_col_name}"
             )
         } else if dur.num_minutes() < 60 * 24 * 10 {
             // 10 days = 1 hour bin
             format!(
-                "CAST(DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1h', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1h' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 hour', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1 hour' AS VARCHAR) AS {end_time_col_name}"
             )
         } else {
             // 1 day
             format!(
-                "CAST(DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS TEXT) as {start_time_col_name}, DATE_BIN('1d', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1d' as {end_time_col_name}"
+                "CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') AS VARCHAR) AS {start_time_col_name}, \
+                 CAST(DATE_BIN(INTERVAL '1 day', \"{table_name}\".\"{time_column}\", TIMESTAMP '1970-01-01 00:00:00+00') + INTERVAL '1 day' AS VARCHAR) AS {end_time_col_name}"
             )
         };
 
         let query = if let Some(conditions) = &count_conditions.conditions {
             let f = get_filter_string(conditions).map_err(QueryError::CustomError)?;
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE {} GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" WHERE ({f}) GROUP BY 1,2 ORDER BY 2",
                 f
             )
         } else {
             format!(
-                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" GROUP BY {end_time_col_name},{start_time_col_name} ORDER BY {end_time_col_name}",
+                "SELECT {date_bin}, COUNT(*) as count FROM \"{table_name}\" GROUP BY 1,2 ORDER BY 2",
             )
         };

This also aligns emitted keys with the handler’s fields: ["start_time","end_time","count"].

405-427: Zero-division and incorrect bin count when remainder > 0

total_minutes == 0 can set num_bins to 0 (dynamic path) and panic on divide-by-zero.
The current quotient/remainder approach later emits num_bins + 1 bins when there is a remainder.

Use a fixed step and generate half‑open [start, end) bins:

-        let num_bins = if let Some(num_bins) = self.num_bins {
-            num_bins
-        } else {
-            // create number of bins based on total minutes
-            if total_minutes <= 60 * 5 {
-                // till 5 hours, 1 bin = 1 min
-                total_minutes
-            } else if total_minutes <= 60 * 24 {
-                // till 1 day, 1 bin = 5 min
-                total_minutes.div_ceil(5)
-            } else if total_minutes <= 60 * 24 * 10 {
-                // till 10 days, 1 bin = 1 hour
-                total_minutes.div_ceil(60)
-            } else {
-                // > 10 days, 1 bin = 1 day
-                total_minutes.div_ceil(1440)
-            }
-        };
-
-        // divide minutes by num bins to get minutes per bin
-        let quotient = total_minutes / num_bins;
-        let remainder = total_minutes % num_bins;
+        if total_minutes == 0 {
+            return vec![];
+        }
+        // choose step minutes
+        let minutes_per_bin: u64 = if let Some(n) = self.num_bins {
+            ((total_minutes + n - 1) / n).max(1)
+        } else if total_minutes <= 60 * 5 {
+            1
+        } else if total_minutes <= 60 * 24 {
+            5
+        } else if total_minutes <= 60 * 24 * 10 {
+            60
+        } else {
+            1440
+        };
+        let step = Duration::minutes(minutes_per_bin as i64);

🧹 Nitpick comments (1)

src/prism/logstream/mod.rs (1)
354-356: Prefer dynamic binning here (pass None) to use the new tiered logic

Hard‑coding 10 bins defeats the “dynamic 1m/5m/1h/1d” behavior introduced by this PR.
-            num_bins: Some(10),
+            num_bins: None,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 312778c and d473745.

📒 Files selected for processing (3)

src/handlers/http/query.rs (1 hunks)
src/prism/logstream/mod.rs (1 hunks)
src/query/mod.rs (5 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-06-18T12:44:31.983Z

Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

src/handlers/http/query.rs
src/query/mod.rs

📚 Learning: 2025-09-05T09:18:44.813Z

Learnt from: parmesant
PR: parseablehq/parseable#1425
File: src/query/mod.rs:484-495
Timestamp: 2025-09-05T09:18:44.813Z
Learning: In the Parseable system, stream names and column names cannot contain quotes, which eliminates SQL injection concerns when interpolating these identifiers directly into SQL queries in src/query/mod.rs.

Applied to files:

src/query/mod.rs

🧬 Code graph analysis (1)

src/query/mod.rs (2)

src/handlers/http/query.rs (1)

query (110-144)

src/alerts/alerts_utils.rs (1)

get_filter_string (361-453)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: coverage
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin

🔇 Additional comments (3)

src/handlers/http/query.rs (2)

166-172: LGTM: Option-wrapped num_bins

Wrapping with Some(1) is correct for a single-bin fast COUNT.

166-172: Fix all endTime usages to snake_case and verify other occurrences

In src/handlers/http/query.rs, update both fields vectors from ["start_time","endTime","count"] to ["start_time","end_time","count"].

Note additional endTime references in src/handlers/airplane.rs (JSON payload keys) and in comments in src/query/mod.rs; please confirm whether those should also be renamed to end_time for consistency.

src/query/mod.rs (1)

301-309: LGTM: serde aliases preserve backward compatibility

CountsRecord keeps canonical start_time/end_time while accepting _bin_* during deserialization.

Refactor date binning logic and SQL query formatting

8634799

Signed-off-by: parmesant <[email protected]>

coderabbitai bot requested changes Sep 5, 2025

View reviewed changes

src/query/mod.rs Outdated Show resolved Hide resolved

src/query/mod.rs Show resolved Hide resolved

src/query/mod.rs Show resolved Hide resolved

parmesant force-pushed the patch-2 branch from 2043243 to 5bf5938 Compare September 6, 2025 11:38

coderabbitai bot previously approved these changes Sep 6, 2025

View reviewed changes

Merge branch 'main' into patch-2

f04aa25

parmesant dismissed coderabbitai[bot]’s stale review via f04aa25 September 7, 2025 09:53

parmesant force-pushed the patch-2 branch from 5bf5938 to f04aa25 Compare September 7, 2025 09:53

coderabbitai bot previously approved these changes Sep 7, 2025

View reviewed changes

parmesant added 2 commits September 7, 2025 19:58

Merge branch 'main' into patch-2

52b1306

Remove numBins from counts request

312778c

- removes parameter `numBins` (bins are being calculated by the backend) - Changes to binning logic

parmesant dismissed coderabbitai[bot]’s stale review via 312778c September 8, 2025 09:57

coderabbitai bot requested changes Sep 8, 2025

View reviewed changes

src/query/mod.rs Outdated Show resolved Hide resolved

src/query/mod.rs Show resolved Hide resolved

src/query/mod.rs Show resolved Hide resolved

make numBins optional instead

d473745

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

coderabbitai bot approved these changes Sep 9, 2025

View reviewed changes

Merge branch 'main' into patch-2

6ad7b85

nikhilsinhaparseable merged commit 7d5600b into parseablehq:main Sep 9, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor date binning logic and SQL query formatting #1425

Refactor date binning logic and SQL query formatting #1425

Uh oh!

parmesant commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor date binning logic and SQL query formatting #1425

Refactor date binning logic and SQL query formatting #1425

Uh oh!

Conversation

parmesant commented Sep 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

parmesant commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2025 •

edited

Loading