Skip to content

Conversation

jtuglu1
Copy link
Contributor

@jtuglu1 jtuglu1 commented Mar 30, 2025

Description

Realtime ingest message gap metric additions

  • The current definition of ingest/events/messageGap remains as-is.
  • Adds ingest/events/minMessageGap, the minimum message gap seen in the currently-running task within the emission period.
  • Adds ingest/events/maxMessageGap, the maximum message gap seen in the currently-running task within the emission period.
  • Adds ingest/events/avgMessageGap, the avg message gap seen in the currently-running task across the entire duration thus far.

Appenderator Benchmarks

  • Adds Stream Appenderator benchmarks for the per-row Appenderator::add() method.

SegmentGenerationMetrics Benchmarks

  • Adds SegmentGenerationMetrics benchmarks for the expensive reporting methods.

Small optimizations to StreamAppenderator::add()

  • Avoid repeated System.currentTimeMillis() by caching locally in a volatile updated by a background thread.
  • Switch to fixed-length array for persist reason list (~20-40ns per row savings).
  • Switch to fixed-size hydrant map allocation in persistAll() (note: this has been omitted since it seems to be slower on smaller sink #s).
  • The above result in what appears to be a 100-300us speedup per 10k iterations

Benchmarks

  • OS: Linux
  • Arch: x86
  • Memory: 240GB ram
  • CPU 32 physical core, 2.6 Mhz base frequency

Appenderator Benchmarks

NB: 5 warmup/5 measurement on 10k calls
Old:

Benchmark                                                                       Mode  Cnt      Score    Error  Units
SegmentGenerationMetricsBenchmark.benchmarkMultipleReportMaxSegmentHandoffTime  avgt    5  22755.902 ± 80.865  ns/op
SegmentGenerationMetricsBenchmark.benchmarkMultipleReportMessageMaxTimestamp    avgt    5  22768.625 ± 63.275  ns/op

New:

Benchmark                                                                       Mode  Cnt      Score     Error  Units
SegmentGenerationMetricsBenchmark.benchmarkMultipleReportMaxSegmentHandoffTime  avgt    5   6254.388 ± 319.197  ns/op
SegmentGenerationMetricsBenchmark.benchmarkMultipleReportMessageGap             avgt    5  32911.301 ±  21.004  ns/op
SegmentGenerationMetricsBenchmark.benchmarkMultipleReportMessageMaxTimestamp    avgt    5   6252.912 ± 222.810  ns/op

Release note

Add min/max/avg message gap reporting metrics to realtime indexing jobs.


Key changed/added classes in this PR
  • benchmarks/src/test/java/org/apache/druid/benchmark/indexing/AppenderatorBenchmark.java
  • benchmarks/src/test/java/org/apache/druid/benchmark/indexing/StreamAppenderatorBenchmark.java
  • benchmarks/src/test/java/org/apache/druid/benchmark/indexing/metrics/SegmentGenerationMetricsBenchmark.java
  • indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
  • indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java
  • indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskTuningConfig.java
  • indexing-service/src/test/java/org/apache/druid/indexing/common/TaskRealtimeMetricsMonitorTest.java
  • server/src/main/java/org/apache/druid/segment/realtime/SegmentGenerationMetrics.java
  • server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorConfig.java
  • server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch 5 times, most recently from 9964029 to 8722e82 Compare March 31, 2025 06:06
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch 4 times, most recently from bc90805 to 4e2633a Compare March 31, 2025 06:55
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch 12 times, most recently from 584a2a6 to 6c80ada Compare April 1, 2025 16:28
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from 6c80ada to dcd0623 Compare April 3, 2025 20:51
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from dcd0623 to f88b828 Compare April 3, 2025 21:42
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from d1794c4 to 1efc571 Compare May 7, 2025 04:54
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from 1efc571 to c4396e2 Compare May 9, 2025 21:22
@kfaraz
Copy link
Contributor

kfaraz commented May 12, 2025

@jtuglu-netflix , is this PR ready for review now?

@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from b4d3fcb to 9370fc2 Compare May 14, 2025 05:40
@jtuglu1
Copy link
Contributor Author

jtuglu1 commented May 14, 2025

@jtuglu-netflix , is this PR ready for review now?

Yes @kfaraz, I'll remove the *Appenderator benchmarks once approved.

@maytasm
Copy link
Contributor

maytasm commented May 14, 2025

@kfaraz The min message gap (in addition to the max message gap) is useful in telling us the spread of the message gap during the emission period. It's not much more work than what we are already doing so I think it is useful to keep it.
Regarding your comment on not adding a new config to emit the metrics, would that be fine as this metric would add some (small) overhead to the processing of events (although honestly other metrics have their own overhead too so is not unique to this metric)

@jtuglu1 jtuglu1 marked this pull request as ready for review May 14, 2025 07:06
@kfaraz
Copy link
Contributor

kfaraz commented May 15, 2025

Thanks for the details, @maytasm ! Will do a final review of this PR tomorrow.

@jtuglu1
Copy link
Contributor Author

jtuglu1 commented May 17, 2025

Thanks for the details, @maytasm ! Will do a final review of this PR tomorrow.

@kfaraz did you get a chance to look here? I've now removed the Appenderator benchmarks.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying the PR, @jtuglu-netflix .
Left some final suggestions.
Once these are addressed, we can merge the PR.

@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from 3a3322e to 68bc990 Compare May 18, 2025 20:40
@jtuglu1 jtuglu1 force-pushed the upgrade-message-gap-metric branch from 68bc990 to f055447 Compare May 18, 2025 21:56
@jtuglu1
Copy link
Contributor Author

jtuglu1 commented May 19, 2025

@kfaraz thanks for the comments. I've left some final responses/reasoning for keeping certain things as-is and touched up other changes – all good here?

@jtuglu1 jtuglu1 requested a review from kfaraz May 19, 2025 00:15
@kfaraz kfaraz merged commit 388869f into apache:master May 19, 2025
74 checks passed
@kfaraz
Copy link
Contributor

kfaraz commented May 19, 2025

Thanks for your patience on this PR, @jtuglu-netflix !
Please update the PR description with the current set of the changes in the patch
so that it doesn't cause confusion to future readers.

jtuglu1 added a commit to jtuglu1/druid that referenced this pull request May 19, 2025
…e#17847)

Changes:
- Add metrics ingest/events/minMessageGap, ingest/events/maxMessageGap, ingest/events/avgMessageGap
- The current definition of ingest/events/messageGap remains as-is.
@capistrant capistrant added this to the 34.0.0 milestone Jul 22, 2025
@jtuglu1 jtuglu1 deleted the upgrade-message-gap-metric branch August 22, 2025 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants